EDUCATIONAL or 
PSYCHOLOGICA 


MEASUREMEN r 


tUARTERLY JOURNAL DEVOTED TO THE DEVELOPMENT A 
•UCATSOM OF MEASURES OF INDIVIDUAL DIFFEREN 


Further Evidence, on Response Sets and Test Design. Lee J. 
Cronhach.. 3 

■Client Acceptance of Self-Information in Counseling. Robert 
B. Kamm and C. Gilbert Wrenn. 32 

The Concepts of Reliability and Homogeneity , C. H, Coombs. . 43 

Problems in Measuring the Effectiveness of Professional Edu- 

, •cation, Donald K. Beckley. 57 

'■The Concept of Validity in the Interpretation of Test Scores, 

■ ■ Anne Anastasi.. 67 

' The Logic of Scale Construction. Edward A, Suchman. 79 

' Validity, Reliability and Baloney. Edward E. Cureton. .... 94 

■*Response Sets: A Note on Consistency in Taking Extreme Posi¬ 
tions. Edward A. Rundquist. 97 

The Interests of Art Students. Walter R. Boro. 100 

A Factorial Investigation of Flexibility. Robert W. Kleemeier 
ahd Frank J. Dudek. 107 

The Standardization of the Moore Eye-IIand Coordination and 
Color Matching Test. Joseph E. Moore.. 119 

* An Investigation of a Counselor Altitude Questionnaire. William 
A. McClelland and H. Wallace Sinaiko. 12.8 

A Note on Thurstone’s Method of Computing the Inverse of a 
Matrix. William C. Cottle...., 134 

Nomograph of Peters and Van Voorhis’ Approximation Formula 
for Correcting Interfunction Correlation Coefficients for Hetero¬ 
geneity. William A. Reynolds. 137 

A Single Chart for Tetrachoric r. William Leroy Jenkins >.. 14a 


* 4 < 















EDUCATIONAL ANDTSYCJ JO*.(X4!CAl. 

MEASUREMENT 

Editor....<*. F«R«KR?e Kobe* 

Assistant Editor.... ■ • ■ M vf.cia M. Mathews 

ASSOCIATE F!>!'Rifts 

Dorothy C. Aihukp. UtAvcr'-Ty n? N’ortH Carolina 

John I f, Koukeh, Kcbtuml Reprc-scntative os she Aoifs-kss? College 
Personnel Association....l:nm:!»*i-v of Missouri 

M. W. Rjohauosox .Kit:lwr«!so«, BdAw., I Irary u:s4 Co. 

Fa (<F t. • * dTK.Yt I’-.', i ID i \> 

David Srgj f. 

I.', y.f .■.< p-fo-stun 

C. I.. >!nnn,r. 

Ohk. Staff fVt:/ni(» 


John G. Dari.rv 
University rj Minnenti 


Harold A. Imioertos' 

Hifharjion, Utiiwi, Iftnij tmd Co, 

Max 1). Enrelhart 

Chieagn City Junior Colleges 

E. B. (tiu’.t.nt, 

United States Employment Set site 

J. P. Guiuoiu) 

Uniemity oj Southern California 

E, F. Lintjojust 

State Unhersitjt >:/ Tout* 

Charles I. Mo.sikr 
Personnel Kettareh Setlimt, ./.(TO. 

P. J. RutoN 

Harvard l Intimity 


11 , C. Ttvt.mi 

The f{\ I , ! f ' V? f intitule for Csw- 
! lt;t si-h 

Tfil-IMA U. TllLUtVlTiAr. 

/ ... r.f fir.u'W: 

i (! •!•••!; i A. Tonv'S 

Ofiilr St.-i If Univfvsiiy 

E. (i. \V)|.I.I«!SHK 

(?n jrtji-t y i:/ U « ncsc'Sri 

Pts.% f t. Wood 

CtdumPia University 

JiHtK K. Vau; 

AffOifT S& h'iSti-'i e hjfttf * lifej 


This journal is open In: (1) discus.dons uf problems i?j Sis Sudd >>{ tin; 
of individual differences, (2) reports; of research on the »icvt:l..}nae>«J .and u:- «f testa 
and measurements in education, industry, mid government, (3) da’nipth.jft; of testing 
programs being used for various purpose;;, and (4) inirsvlliuitum!: notes pertinent to 
the measurement field, such as suggestions of new types of items or iaipr.iu-d mytlmds 
of treating test data. Contributors receive one hundred reprints: of their articles 
without charge. Manuscripts should be sent to (>. Frederic Kudo?, Bo* 090?, College 
Station, Durham, North Carolina. Writers arc requested to Include a biographical 
sketch with each manuscript, following the style of the section on contributors pub¬ 
lished in each issue. 

EDUCATIONAL AND PSYCHOLOGICAL MKASFltIVMLNT is published 
quarterly, one volume per calendar year, at Mount Royal and Guilford Avenues, 
Baltimore 2, Maryland and Durham, North Carolina. Entered ar, second class mat¬ 
ter August 16, 1*148, at the Post Office at Baltimore, Maryland, under the. Act of 
Match J, 1.879. 

Subscription rate, $3.00 a year, domestic and foreign. Single itupies, SI.SI), with 
the exception of Volume VII, No. 3, Volume VIII, No. 3, and Volume IN, No. 3, 
for which the price is S2.50 each. Hack volumes: Volumes V (1915), VI (l'MtO, VII 
(1947), VIII (1948), and IN (1949), $6,011 tacit. Volumes I through IV arc available 
in a small-print edition at $3.00 per volume (paper bound). 

Orders should be sent to EDUCATIONAL AND PSYCHOLOGICAL MEAS¬ 
UREMENT, Box 6907, College Station, Durham, North Carolina. 


Copyright, 1950 by G. Frederic Suiter 




EDUCA TIONAL and 
PSYCHOLOGICAL 



The Theory and Classification of Criterion Bias. Hubert E. Broc- 

Den and Erwin K, Taylor. 

An Investigation of Two Hypotheses Regarding the Nature of the Spatial- 
Relations and Visualization Factors. William B, Michael, 

Wayne S. Zimmerman and J. P. Guilford.. 

On the Use of Interaetions as "Error Terms" in the Analysis of Variance. 

, Alien L. Edwards. 

v The Objective Measurement of Dynamic Traits. R, B. Cattell, A. B. 

Heist, P. A. Heist and R. G. Stewart.. 

The Construction ami Validation of a Work-Type Auditory Comprehen¬ 
sion Reading Test. George Spache....... 

Validation and Standardisation of the AGO General Mechanical Apti¬ 
tudes 'Test for the Selection of Civilian Employees in War Department 

Installations. Adam Porvsen, Jr. 

'Three Aids in the Evaluation of the Significance of the Difference Be¬ 
tween Percentages. C. H, Laivshk and P. C. Barer. 

'A Study of Faking on the Under Preference Record. Orrin H. Cross . 
Psychological Testing for Immigrants in a Vocational Counseling 

Agency. Benjamin Balinsky. 

An Investigation of the Personality Trails of Art Students. Martin 

Sl'IADt.lA... 

The Knowledge nf General Education of a Sample of Syracuse University 
Students as Revealed by the Cooperative General Culture Test and the 
Time Magazine Current Affairs Test. N. M. Downie, M. E. 
Trover and C, R. Pace. .. ... 


The Full-Range Picture Vocabulary Test: II. Selection of Items for I 

Rinat Seales, Robert B. Ammons and Leo D. Kachiele . 3<j 

Aloes Face Validity Exist? Sidney Adams— .. 3j 

Administration the Purdue Pegboard Test to Blind Individuals, j 

James W. Curtis...... 3< 

Evaluating Psychometric Proficiency, Frank M. DuMas. 3 j 

Interest and Personality Measures of Veteran and Non-Vitteran linker- \ 

sity Freshman Men. Katherine K. Fassett.. ;3] 

•-./ In Sfrr/httt PfirtntfnA Rfi** '//’ C Oil n ~ 11 T \Y r ^ V N. Tf 



















j ; .m;c;.Yrfc jnai, .wn vsvn a stxm 



MEASUREMENT 


Xu I'itKHEttlC KtiOtl 
M ’.ftrs a M. Mathews 


C. Atoms; 

} i, l<MHV.r?‘, 1: 

M. \V, UmiAv.sooK 


ASwsJA'j?; r :>i j- 





K-h.r, : 



'U l/f ?5*t 


<>f North Carolina 
Aks err casj College 
; ay of 0!i!ahwn» 
>, I E'-ry .555(1 Co. 


V.<-AHD t *1- C‘ <">1T.R;V1 !\r; I-111'i OHS 


1 1 1 . i:': { i. 1 Laui v.Y 

Harold A. 

thvtry Cs. 

Max D. Kkgkliiart 
C toqji Cif/ Ju’JKt Odkfft 

K, B. Greewe 

/frtjrwr f. ‘nh'triily 

J. P, C.it u.vmui 

Unkmiff gJ Southern Caltfernij 

E. E Einoquist 

.y/atr University >■/ lotus 

ClIATU.K.S I. Mojsier 
PersonnelHestorth Xtctian, .IMS), 

P. J. Hulun 

//anw J Uuirerstty 


l).AVJ!'i Sl-.DKf. 

V. S. fi!$rr "a 1 

C. 1,. StlARTlF. 

'Mi .'■'(•>ir f 

II r. Tavi.^b 

Tnf IT. H. OjfiiAn Imfiliele far Cm- 

rii’.ir-ifs h 

Tin t.MA t; r , TjiVK*l'n*-'E 

f ‘rn\tT!tfy ’ fif,!£■■■ 

11 r.wnrR t A. I>>oi’S 
Ohio Mate t.inr.miJy 

K. Cl. Wim.i-v.imw 

Vr.i-. rruly -;■/ Minnesota 
Ben D. Wood 

J >:r- R. Vaw: 

Seienee h’tH.ir(h Juetijits 


This journal Is open to: (I) discuitsinns of problems i« the field of the measurement 
ol individual differences, (2) report* of research m the development and use of tests 
and measurements in education, industry, and government, (5) ifcsdijttitM «>( testing 
programs Wing used far various purposes, and (4) mii-celiancous notes pertinent to 
tlie measurement field, such ;is suggestions of new type:; of item* nr improved mcthodi 
of treating test data. Contributors receive one hundred reprints «*f their articles 
without charge. Manuscripts should he sent to G, Frederic Kuder, Box fiW?, College 
Station, Durham, North Carolina. Writers arc requested to include a biographical 
sketch with each manuscript, following the style of the section imi cmUrihutura pub¬ 
lished in each issue. 

EDUCATIONAL AND PSYCHOLOGICAL MKASC H CMENT is published 
quarterly, one volume per calendar year, at Mount Royal ami Guilford Avenues, 
Baltimore 2, Maryland and Durham, North Carolina. Entered as second class mat¬ 
ter August 10, 1948, at the Fust Office at Baltimore, Maryl and, under the Act of 
March 3,1879. 

Subscription rate, $5.00 a year, domestic and foreign. Single copies, SI.SI), with 
the exception of Volume VII, No. 3, Volume VIII, No. 3, and Volume IX, No. 3, 
for which the price is S2.50 each. Back volumes: Volumes V (19-15), VI (UMfi), VII 
(1947), VIII (1 DIB), and IX (1949), $6.00 each. Volumes I through IV are available 
in a small-print edition at $3.00 per volume (paper bound). 

Orders should be sent to EDUCATIONAL AND PSYCHOLOGICAL MEAS¬ 
UREMENT, Bos 0907, College Station, Durham, North Carolina. 

Copyright, 1950 by G- Fretlcjic Kudet 









ED UCA TI ON AL am 
PSYCHOLOGICAi 


MEASUREMEN1 


JOURNAL DEVOTED TO THE DEVELOPMENT AM 
OF MEASURES OF INDIVIDUAL DIFFERING 

A Study of General Education at Syracuse University with 
Special Attention to the Objectives. N. M. Downie, C. R, 
Pace and M. E. Trover..... 

Educational Growth as Shown by Retests on the Graduate Record 

Examination. Joseph C. Heston. 

The Assessment of the Academic Aptitude of the Graduate 
Student . Robert M. W. Travers and Wimburn L. 

Wallace. 

Measuring Originality in the Physical Sciences. Milton M. 

Man’DEI.l .... 

Probability Approach to Forecasting University Success with 

Measured Grades as the Criterion. L. J. Lins. 

Preferences and Behavior Ratings of Dominance. William R. 

Biuge. 

Reproducible Scales and the Assumption of Normality . Robert 
G. Smith, Jr. 

A Factorial Study of Beliefs. J. W. Holley and Claude E. 

Buxton. 

Opinion and Action: A Study in Validity of Attitude Measure- 

moil. C. Robert Pace.. 

'Estimating Intelligence by Interview. Joseph V. Hanna. 

Inclusion of “None of These” Makes Spelling Items More Diffi¬ 
cult. Marcia Boynton.. 

A Table and an ABAC for Testing the Significance of Rho. 

Frank. M. DuMas. 

Recent Publications Received ... 

T 1 - r^— r ’„ t n'-C 
















r m x atiov w . and i*>vonu .*><» jctu. 


MEASUREMENT 


Editor.. . t.i. Ktmatc KroEs 


A'• **;*;'•• * .- 


. . M .*<>«•' > M 

. Mathews. 




ASfy. -n.vi i 

f Hi 

HH 


I?0SU>T3lV , i'Vn5 ?*!’••. , • . 

. I 

’t;ivctT.;fy of X't.,8 

’ n Utroljpa 

John H. Rowicr, Rcyrc 

sent as 

vV?v tl'?* A?Vi J" 11 

t-> Coileac 



... TuEtvi 

TV iriui Co, 

M.W.UirSAKnx-W. 

?saj-4b 

Ilcilriwc, Iht 

h* h\i* 

i'-r.Y'i P- 

<*'. J DlU!H8 


JtHIS i», 17.55U.HV 

Dv. 

;?) Si:otu. 


Uitirmity Mimn-.te 

f . 

,V. it cj J: .SutiilkM 


IIaROJ.U A. J’.DGEBtO?; 

r. t 

.. SNA SIT IT. 


RickaniV.r:, !itr,ry f. 

7 

• f . r,-'rv 


Max 17. Es tie. [.HART 

n. i 

Trvrvy: 


Ckkag* City Junfar Ccflrgtt 

71-f 

* Si .. /*. 1 i/'h’ff j *}f 

■Mute fc.r t,tm< 

K. H. (III HR KB 
tPeyne (iww««(jt 

Tnv:. 

?,**,& {. i. 1 55 o sl-N'l 1 

iV*v if* -? (xx,..ir • 

y.n: 

J, P, Cii;tj.tt>Rn 

Hmi 

ahi 

rri.HT A. Tout’:. 


JJnsvtritiy */ t bfisfffrwia 

i . rftity 


li, F. l.tNDQVDiT 

i;. t 

i. WjM.UM.WS 


Start Unkmity f-f /c-wa 

Vt; 



CitAht.v.s L M OKI Elf 

Kfc?i 

11. Wntm 


Fmenxtl Rttmtb Smimt, J.U.Q, 

r,Vi 

7<m3:,j t '»ii« fitly 


P. J. Rltuis 

j*»i 

X K. y..m: 


itartani U r; icm'ty 

,SVf 

t fKfff ntfif. 

,%*r.r 


This journal is open to: fl) discussions <4 problems in riu- fit-14 «•<! tie rowatframt 
of individual differences, (2) reports of teuidi t»» the tkvr-li.j.-.Jii-m and use <rf t*st» 
and nieasurernents in education, industry, und fiovef Ament, Hit «!iv«ri}iiiujts of te&ling 
programs being used for various putpotw, and (4> nusccB.UMmis jwitt* pertinent to 
tin; measurement field, such as suggaiinns <>i near types of item* **: improved methods 
of treating test data, Cnntribuinrs receive one hundred reprints of their articles 
without charge. Manuscripts should be sent to l}. I-'se Jerk Kader, Box t'AHp, College 
Station, Durham, North ’Carolina. Writers are requested sn include a biographical 
sketch with each manuscript, following the style ni the se- tiosi •m Contributor* pub* 
lished in each issue. 

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT is published 
quarterly, one volume per calendar year, at Mount Royal and Guilford Avenues, 
Baltimore 2, Maryland and Durham, North Carolina. Holered ;:s m-rori-i dam mat¬ 
ter August 16, 194S, at the Post Ofiice at llaltimonr, Marvkmi, under the Act of 
March 3,1879. 

Subscription rate, $5.00 a year, domestic and foreign. Single, inpir,.. $1,511, with 
the exception of Volume VII, No, 3, Volume, VIII, No. 3, and Volume IX, No. 3, 
for which the price is $2.5(1 each. Bach volumes: Volumes V (19 555, VI (t'MO), VII 
(1947), VIII (1948), and IX (1949), $6.00 each. Volumes I through IV are available 
in a small-print edition at $3.00 per volume (paper bound). 

Orders should he sent to EDUCATIONAL AND PSYCHOLOGICAL MEAS¬ 
UREMENT, Box 6907, College Station, Durham, North Carolina. 

Copyright, 1950 by G. Frederic Kwict 









DUCAT ION AL and 
S YCllOLOGICA L 


MEASUREMENT 



TABLE OF CONTENTS 

VOLUME TEN, NUMBER THREE, PART 2, 1950 


The ippo Convention Program of the American College Personnel 
Association . 443 

American College Personnel Association , Officers and Commit¬ 
tees . . 448 

Editors' Foreword . 430 

Developments in Counseling by Faculty Advisers. Carroll 
L. Miller . 451 

Developments in Residence Hall Counseling. Merle M. 
Ohlsen. 455 

Developments in Counseling Bureaus and Clinics. Royal M. 
Embree. 465 

s No Fain Imaginings. Thelma Mills . 476 

Evaluation and Research in Group Dynamics. Kenneth F. 
Herroi.d. 492 

The Creation of an Effective Faculty Adviser Training Program 
Through Croup Procedures. Ira J. Gordon. 505 

A Genetic Study of Sociality Patterns of College Women. David 
S. Brody. 513 

N/tw to Go About the Process of Evaluating Student Personnel 
Work. William M. Gilbert. 521 

''Major Limitations in Current Evaluation Studies. Ruth 
Std a nn 


r"!T 














\ 


* An Inventory of Student Reaction to Student Personnel Sendees, 
Robert B, Kamm........ ^7 

The Measurement 0/ Student Conceptions of the Role of et College 
Advisory System. Edgar Z. Friedenrerg .... 5^ 

The Role of Student Gmernmcnl in the Student Personnel Pro¬ 
gram, BrotherLouis.. 565 

Student Personnel Work and the National Student Association, 
Gordon Klopf.. 577 

Contributions of the Student Union to the Ttml Personnel Pro¬ 
gram. Donovan D. Lancaster. ..... 585 

Major Issues and Trends in the Graduate Training of College 
Personnel Workers. W. W. Blaesser and Clifford P. 
Froelich.... 588 


Employment Outlook for the icjjo Crop of College Graduates . 
Ewan Clague..,... $96 

Our Stake in the Occupied Countries. Harold E. Snyder. ... 601 

Plans for the New Internationa / Christian University in Japan. 
Maurice E. Trover . 603 








EDUCATIONAL and 
PSYCHOLOGICAL 



VOLUME TEN, NUMBER ONE, SPRING, 1950 


Further Evidence on Response Sets and Test Design, Lee J. 
Cronbach. 3 

Client Acceptance of Self-Information in Counseling. Robert 
B. Kamm and C. Gilbert Wrenn. 32 

The Concepts of Reliability and Homogeneity. C. H. Coombs, , 43 

Problems in Measuring the Effectiveness of Professional Edu¬ 
cation. Donald K. Beckley.. 57 

The Concept of Validity in the Interpretation of Test Scores, 
Anne Anastasi. 67 

The Logic of Scale Construction. Edward A. Suchman. 79 

Validity, Reliability and Baloney. Edward E. Cureton. 94 

Response Sets: A Note on Consistency in Taking Extreme Posi¬ 
tions. Edward A, Rundc*uist. 97 

The Interests of Art Students, Walter R, Borg. 100 

A Factorial Investigation of Flexibility. Robert W. Kleemeier 
and Frank: J. Dudek. 107 

The Standardization of the Moore Eye-Hand Coordination and 
Color Matching Test, Joseph E, Moore . ng 

An Investigation of a Counselor Attitude Questionnaire. William 
A, McClelland and H, Wallace Sinaiko. 128 

A Note on Thurstone's Method of Computing the Inverse of a 
Matrix. William C, Cottle. 134 

Nomograph of Peters and Van Voorhis’ Approximation Formula 
for Correcting Interfunction Correlation Coefficients for Hetero¬ 
geneity. William A, Reynolds. 137 

A Single Chart for Tetrachoric r. William Leroy Jenkins ... 14a 

New Tests . !. 14? 

















FURTHER EVIDENCE ON RESPONSE SETS AND 

TEST DESIGN 


LEE J. CRONBACH 1 
University of Illinois 

When a person takes an objective test, he may bring to the 
test a number of test-taking habits which affect his score. 
Personal ways of responding to test items of a given form 
(e.g., the tendency to say “agree” when given the alternatives 
“agree”-“uncertain”-“disagree”) are frequently a source of 
invalidity. In 1946, the writer (4) assembled evidence demon¬ 
strating that these “response sets” are present in a wide variety 
of tests. Since that time, much new evidence has come to 
light, and it is now possible to examine more completely the 
nature of response sets. While much of the material to be 
reported is new, evidence has also been drawn from scattered 
publications which were overlooked in the earlier review. Ma¬ 
terial on response sets is to be found in a great many sorts of 
studies, discussed under many names. Particular attention 
should be drawn to the early reports of Lorge (15) and Good- 
fellow (6) on this topic. 

As our earlier report demonstrated, response sets have been 
identified in tests of ability, personality, attitude, and interest, 
and in rating scales. Among the most widely found sets are 
acquiescence (tendency to say “True,” “Yes,” “Agree,” etc.), 
evasiveness (tendency to say “Indifferent,” “Uncertain,” 
etc,), and similar biases in favor of a particular response when 
certain fixed alternatives are offered. Other sets include the 
tendency to work for speed rather than accuracy, the tendency 
to guess when uncertain, the tendency to check many items 
in a checklist, etc, Response sets become most influential as 
items become difficult or ambiguous. Individual differences 
in response sets are consistent throughout a given test, as shown 

(This study was assisted by funds from the Bureau of Research and Service, 
College of Education. 


3 



4 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


by split-hair coefficients. Response sets dilute a test with fac¬ 
tors not intended to form part of the test content, and so reduce 
its logical validity. These sets may also reduce the test's em¬ 
pirical validity. Response sets tend to reduce the range of 
individual differences in score. 

The pattern of this discussion is as follows: First, many 
studies are cited which bolster the conclusion that response 
sets are widely found, and are particularly influential when 
a test is difficult. These new sources confirm earlier findings 
and do not modify them. The significant new material in this 
section relates to two multiple-choice tests, and confirms the 
hypothesis that this form of test is nearly free from response 
sets. The second section of the report deals with the nature 
of response sets. Questions considered are: Can performance 
be altered by special directions or training to avoid response 
biases? Are response sets consistent traits, so that a person 
shows a similar set on different tests? Are response sets corre¬ 
lated with other aspects of personality? These studies deal 
particularly with the question whether response sets are due 
to a transient mind-set and are therefore only a nuisance in 
testing, or whether they may provide data on important vari¬ 
ables. The third and final section reviews methods used to 
control the influence of response sets on validity, and discusses 
what test constructors can do to design better tests. 

Evidence that Response Sets Exist 

It is scarcely necessary to marshal further evidence that 
reliable individual differences in response sets exist. Yet the 
widespread use of test forms which permit response sets indicates 
that their existence is not adequately appreciated. It is not 
only the old tests—Seashore, Bernreuter, Thurstone attitude, 
Strong—that suffer from response sets. New tests appear con¬ 
tinually, especially tests of attitude and personality, whose 
forms invite response sets. The writer has routinely requested 
graduate students to analyze their data for response sets when¬ 
ever their research employed tests with fixed response cate¬ 
gories (A-U-D, Yes-No-?, etc.). Never has such an analysis 
failed to disclose individual patterns of response, statistically con¬ 
sistent from item to item. 



RESPONSE SETS AND TEST DESIGN 


5 


The most effective simple design to demonstrate response 
sets is to obtain a score for each person on the suspected 
response set. Thus, Lorge tested the existence of “gen-like,” 
or acquiescence on the Strong test, by counting how many 
items each person marked “L.” The split-half or Kuder- 
Richardson reliability of the response-set score can then be 
computed. Table I condenses the evidence obtained by this 
and other techniques, evidence which, together with that pre¬ 
viously assembled, shows conclusively that response sets are 
to be found in a great many tests. 

One study requires a separate report, because it is based on a 
factorially-designed test in which items are intended to be 
homogeneous. Kenneth Eells supplied the writer with tests 
“Cards” and “Figures,” from Thurstone’s Tests of Primary 
Mental Abilities , which had been given to pupils in a Mid¬ 
western city as part of a study by the University of Chicago 
Committee on Cultural Factors in Intelligence Tests, under a 
grant from the General Education Board. Both of these tests 
present a geometric figure at the left of the row, and follow it 
with figures just like the given one save that they have been 
rotated through go 0 , i8o°, or 270°, or are mirror-images of one 
of these rotations. Directions are to “mark every card (figure) 
that is like the first card (figure).” It was observed that some 
pupils seem to search for ail correct answers, whereas others are 
content to identify one or two seemingly correct answers, and 
then go on to the following row. Papers were drawn at random 
from those given to all pupils in two large junior high schools. 
Papers were discarded where any row had been omitted, or 
where the total score on Cards was high (46 or more out of 54 
possible). This avoids spuriously high apparent reliability for 
the response-set score. The test had been given with double 
time, and the test was in effect unspeeded for the pupils studied. 
Two response-set scores were obtained for each pupil: Cards 
R -f- W, and Figures R -j- W. This score indicates a tendency to 
mark many items in a row. It implies thoroughness and per¬ 
sistence in marking, and perhaps acquiescence. The correlation 
of the two R-f W scores is .$4 (N = 109). On the whole, those 
who mark fewer items appear to be poorer students, but no 
estimate of response sets, independent, of ability, could be ob¬ 
tained. For the selected cases, the correlation of R + W cards 
with R — W Figures was .44, and that of R + W Figures 
with R — W Cards was .33. These data are interpreted as 
showing that in addition to the space factor (ability to discrim¬ 
inate similar forms), performance on this test is influenced by 
a response set. Many students are found who mark few or no 
incorrect figures (R + W = R — W) but who fail to mark all 



TABLE i 

Studies Reporting Response Sets 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


5 ! 

3 

& 

2 


T3 it 

a£ 


sli 

. 5 ? oj 

as 



>s 

*3 

V. 

rj 


O 

■c 


a i 

S'! 

T 


bo 

c 

‘.a 

(J 

QJ 

•C 

U 


P.T3 

.|u 
S S 3 

II 

0 


bo 

.g a« 

■3'ST3 
JI &H 
u 


.|i 

is 

■St 

§1 


bo M 

II 

■as 

° o 
irt 13 < 

a « ! 

H 


g ■a 

Is * 


ITS 

<u 59 
§1 
M 


Ln 4—i 
QJ 4-1 

-JCS rt 

u 

W 


a 

8 - 

< 


14 Hi 

i> c 5 « - 

Sis 8 


: 1 h 

M Cl 


« c 

o e |J? 
2 ^“S-£ 

r »«=: — e 

2 t.E„ 

,p*S 2 h u x c 

ri 1 q?J 

§ S S„g»~ 

*J '2 K< 

t; "S.»- f> p 'T 

£ ^ *111 

° £. jM 0**8 

§ “I s " >. g H- 

'C w ^ £ ,tf K u 

ffl c w? £ o ui 

Fii! S JS 54 S**,“ 
2 fcfc -a -2 c 

8=3 as^Bi. 

u. 


£ <h A ,3 AS 
«t»s «-| 
~ s e 

■ U * u ■ 3 E 

•g 8 8 

J.S , i c L 

S U t T « S' 
g «2 • , 



X) 

g 

^ - 
4T'E 
C 0 

8 S 

it 

|5 


g* 


■gi 

VJ 

’ "* u 


gs- 

>* E 


8-a 

S** 3 

6 E 


■gs. 

o - 


Yes-No 

al 

j; M 
u 


Is. 


55"D 2 
O u u 


.a 

1 

a 


81 ' 
•cl 

lAsitJl 

5 1 y, « S f ^ 

,. nfrjl 8 S 

aip^ic 

§»fB «•£&*{ 

2 ^g»^ 


.5 

M 

h4 

«** 

l 4 
JS**' 
BA 

3 •-< 
C ^ 
u 0 

•*1 

b| 

f ft O 
« fa 

*3 iP 


3 0 
g*q 
< 


Q 

<4 

Jl 


£ 


I 

3 



RESPONSE SETS AND TEST DESIGN 


7 





8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

the correct alternatives. Since some of the reliable response-set 
variance is uncorrelated with the space factor, the entrance of 
responsesets reduces the factorial purity of the test. Certainly 
tests which aim at measurement of a single factor muse be de¬ 
signed to eliminate response sets. 

Response Sets in Multiple-Choice ¥i tsts —The only major form 
of fixed-alternative test which has so far been found free from 
response sets is the multiple-choice item. In order to determine 
whether response sets can be extracted from a typical test 
of this type, the writer has studied the Henmon-Nelsott Test 
of Menial Ability, Form A , for Grades 3-8. The data for this 
study were supplied by Eells, from the study which provided 
the Thurstone data discussed above. Thousands of test papers 
were available, since every child in several grades in a mid- 
western city had been tested. The sample for this study was 
chosen indiscriminately, from papers of upper-lower and lower- 
middle-class children. In administering the test experimentally, 
Eells allowed an extended time of ao minutes beyond the 
standard time of 30 minutes. Papers not completed even in 
the extended time were discarded in the present analysis. 

The Henmon-Nelson is a suitable test for investigating re¬ 
sponse sets because items were prepared with care, are fairly 
well arranged as to difficulty, and are designed so that the 
correct answer appears about equally often in each of the 
five response-positions. The hypothesis is that some students 
may persistently tend to select choices early in the group of 
five. This would raise their scores on items where the correct 
answer is choice “1” or "a," but lower than on items keyed 
"4” or "5.” The psychological basis for the hypothesis is the 
possibility that some students read every alternative and dis¬ 
criminate carefully, where some merely read through the item 
to find a plausible answer, mark it, and go on to the next item. 

The procedure was the usual one: to obtain a "bias" score 
for each individual and determine its reliability. If the score 
is reliable, the response set is proved to exist, The response 
set score for the present hypothesis consists of "number of errors 
appearing to the left of the correct answer’’ minus "number of 
errors to the right of the correct answer." Before rescoring 
papers for bias, papers of high-scoring pupils (those having 



RESPONSE SETS AND TEST DESIGN 


9 


a score above 60 out of 90 items correct) were discarded. This 
was done to increase the likelihood of finding a response set, 
since response sets have no opportunity to show themselves 
when the pupil gets most items correct. For a group of 66 
papers, bias scores ranged from 24 to — 12. The person with 
the bias score 24 had made 39 errors to the left of the true 
answer, and only 15 errors to the right of the true answer. 
Such a preponderance is hard to explain as other than a habit 
of marking items. For the cases studied, however, the split- 
half reliability of the bias score was only .095, corrected. Such 
a low correlation indicates that the postulated response set 
is of no consequence for this group. A second sample of 84 
cases having raw scores of 40 or below in extended time (these 
pupils had IQ’s near or below 80) were studied separately, 
in order to increase the probability of finding a response set. 
For these pupils, the reliability of the bias score was .42, 
corrected. Evidently for a group of pupils taking a difficult 
multiple-choice test, reliable response sets can be found. Bias 
has a slight relation to raw score; the mean raw score for these 
poor pupils was 24.5 for those with negative bias, and 29 for 
those with positive bias. For some reason, very poor students 
tended to mark alternatives to the right of the correct answer 
proportionately more often than slightly better pupils. 

An attempt was made to demonstrate such biases as “prefer¬ 
ence for position 1.” No statistical evidence for such sets could 
be obtained, although an occasional case does suggest that 
such biases may occur. One boy, for example, never in 90 
items marks the fifth choice as correct, and another student 
places 30 of his marks on position “1.” 

A second study was made with a modified version of the 
Ohio State University Psychological Examination , using data 
made available by N. L. Gage and Dora Damrin. The 
shortened test they used consists of 90 five-choice vocabulary 
items, unspeeded. This test was administered to unselected 
juniors and seniors in several high schools. When papers for all 
171 pupils were scored for tendency to place answers before 
rather than after the correct position, the odd-even reliability 
of the bias score was found to be ,20. When only the lowest 65 stu¬ 
dents (as judged by the total number right on the test) were used 



lO EDUCATIONAL AND KSYCHOLOGICAL MEASUREMENT 

as a sample to determine the reliability of the bias score, the re¬ 
liability rose to .29, This was a group of students for whom the 
test was extremely difficult; the highest score for the group was 
aa right out of 90. It should be noted that this test is normally 
used for predicting college success among superior high-school 
students; the highest score in this limited subdivision of our sam¬ 
ple is only chance expectation. When an even more restricted 
sample was used—-the lowest 26 cases, all of whom fell below a 
raw score of 15 items correct—the reliability of the bias score 
rose to .54. The mean bias score changed as the quality of stu¬ 
dents became poorer. For the total group, the mean bias score 
was —6.5; for the second group, —7.7; and for the very lowest 
group, — 9.7. Here, also, the poorest students apparently tended 
particularly often to mark errors to the right of the correct 
answer. 

Both of these studies demonstrate that response sets are a 
minor factor, since so great a selection of cases was required 
in order to demonstrate any evidence of bias. Probably other 
multiple-choice tests where all subjects mark all items suffer 
little from response sets. Confirming studies on other multiple- 
choice tests are desirable, but the generally satisfactory ex¬ 
perience with forced-choice tests should encourage their con¬ 
tinued widespread use, 

Stability of Response Sets 

While there is ample evidence that response sets are con¬ 
sistent throughout a single test, it is important to determine 
whether they are characteristics of the individual stable from 
time to time, or are transient sets which can only be regarded 
as errors in testing rather than personality characteristics. 

Some evidence that response sets are stable appears in scat¬ 
tered studies. Thorndike (22, p. 33) reports that on a speeded 
Air Force test, scores obtained at the same sitting correlate no 
more than scores obtained several hours apart. If a speed- 
accuracy set is operating, it is not a set which shifts from hour 
to hour. Singer and Young (21) found that a tendency to rate 
varied stimuli as “pleasant” was highly stable, correlations as 
high as .90 being found under certain conditions over time in¬ 
tervals of two weeks. 



RESPONSE SETS AND TEST DESIGN 


II 


Whereas these and similar studies tend to stress the stability 
in response sets, we ordinarily think of mental sets as easily 
changed by suitable directions. If the response set is viewed as 
a way of interpreting an ambiguous situation, as when the word 
“like” is left for the subject to define, any change in directions 
should re-define the stimulus elements and alter individual re¬ 
sponse sets. Several studies show that this can be done. 

Rubin (to) several years ago demonstrated the existence of 
bias in the Seashore Pitch Pest. He gave the Revised Test B 
twice to 245 college students, and found that the group as a 
whole used 13958 “H” responses and only 10542 “L” responses, 
in judging whether the second tone was higher or lower. Ac¬ 
cording to the key, there were actually an equal number of 
differences in each direction. A similar mean bias was found 
by Rubin in data of Farnsworth. 

In two ingenious studies Rubin then established that tem¬ 
porary sets are a major element in bias. First he gave a “guess¬ 
ing” test, in which subjects imagined a tossed coin, and wrote 
down the way they imagined it would fall. One group was given 
directions as follows: “Imagine a coin which has an H for High 
on one side, and an L for Low on the other side.” In the other 
group this was reversed: “Imagine a coin which has an L for 
Low on one side, and an H for High on the other side.” There 
was a significant preponderance of the first-mentioned response 
on the first guessed item (i.e., the former group tended to say 
“ H the second group to say “L”). There was a significant 
preponderance of the second-named response on the third guess 
of the series. Rubin then applied the same reversal to the Sea¬ 
shore test directions. 272 students were told, “If the second 
tone is lower than the first tone, print L; if higher, print H." 
Only 56.8 per cent of the errors were lows marked “ H ,” com¬ 
pared to 60.0 per cent when much the same group were given 
the original directions (but note that some bias remained). 

A miniature experiment performed by graduate students as 
a class exercise gives further indication that response sets are 
easily altered.. Lynn Henderson and Esther Williams adminis¬ 
tered the revised Seashore Pitch Record B to ten students, re¬ 
peating the Record to make a total of 100 items. At the next 
class meeting, each student’s scored paper was returned to him 



12 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

for brief study. His attention was drawn specifically to the 
nature of bias by having him count whether he tended to mark 
11 H” more often than “L.“ He was informed that in each group 
of ten items, just half were correctly answered “High" The 
writer conducted the discussion, talking about bias for about 
fifteen minutes and suggesting strongly that bias could be elim¬ 
inated with effort and that pitch scores would be improved as a 
result. Papers were collected as soon as bias had been examined, to 
reduce the possibility of learning specific answers. Students were 
never informed, and few suspected, that the same record was 
used for both items i to $o, and 51 to 100, After the discussion, 

TABLE 2 

Results of Pitch Tests before and after Discussion of Dias 
Score on tueemive 

Sludent tesla Hies on successive tests' Toll! »»re Total bias 



IA 

III 

IIA 

IIB 

IA 

ID 

It A 

mi 

1 

II 

I 

II 

I 

4 * 

44 

42 

46 

-2 

-8 

-4 

0 

85 

89 

— 10 

-4 

a 

45 

39 

4 » 

43 

— 1 

IO 

2 

— 2 

«4 

84 

8 

0 

3 

40 

42 

44 

45 

-8 

Zt 

8 

— 2 

82 

89 

— 11 

6 

4 

35 

4 « 

35 

33 

2 

2 

2 

“9 

68 

-4 

4 

s 

3 « 

3 d 

29 

32 

4 

-4 

— 2 

4 

74 


O 

2 

6 

31 

37 

3 * 

48 

-14 

2 

2 

0 

68 

79 

“12 

2 

7 

30 

32 

42 

39 

0 

0 

-4 

-2 

62 

81 

0 

-6 

8 

H 

31 

32 

3 <> 

4 

1 

0 

0 

55 

68 

6 

0 

9 

It 

27 

33 

32 

-24 

— 16 

“IO 

— 8 

Si 

6 5 

- 50 - 

-18 

IO 

21 

30 

28 

-12 

-(> 

0 

0 

47 

58 

-18 

0 

Median 

33 

34 

37 i 

— 2 

-4 

0 

0 

7 * 

73 1 

~7 

0 

Mdn. absolute 
value 

Mean 

33 

35 

36 

38 

4 

5 

2 

2 

68 

74 

9 

3 


* Bias score equals number of items marked High minus number marked Low. 


the 50'item record was readministered, the papers collected, 
and the record readministered again, yielding a ico-item post¬ 
test. This is admittedly an inadequate experiment, especially 
in the absence of a control group to measure the effect of prac¬ 
tice and suggestion, separated from training regarding bias. 
The results are nevertheless striking (Table a). Bias was notable 
on Tests IA and IB, largely eliminated on HA and 11 B. Total 
scores generally rose, especially on IIB, The amount of gain 
in score corresponds somewhat to the amount of initial bias, 
except for case 7, whose gain is presumably an effect of prac¬ 
tice or motivation. This finding is not statistically significant. 

This study, small as it is, seems to show that bias can be 






RESPONSE SETS AND TEST DESIGN 


13 


eliminated by direct coaching which makes the subject aware 
of his own bias. If the Pitch Test measured pitch threshold 
alone, increased insight into habits of responding would not 
affect scores. The study does not prove that training in bias 
raises pitch scores, but it strongly suggests that this is true. 
Wyatt (28) also reports training subjects to avoid bias as a 
means of improving discrimination. Surely, on the basis of these 
data, it can be recommended that Seashore test papers should 
be checked for bias, and that where the person shows a marked 
bias in either direction scores should be regarded as probably 
giving too low an estimate of the person’s ability to discriminate 
pitch. 

Another report that altering directions affects response sets 
is made by Goodfellow (6). He finds that in psychophysical 
judgments the predisposition to report a stimulus as absent 
was reversed when the directions were worded: “Remember 
that in approximately one-half of the trials the correct answer 
will be yes.” 

The resemblance between response sets inferred from statis¬ 
tical data and “learning sets” found experimentally by Harlow 
(9) should be pointed out. In studies of monkeys, and also of 
children, he established definite evidence of generalized learn¬ 
ing to solve problems. The monkey enters an ambiguous situa¬ 
tion, namely, a discrimination apparatus where the proper 
choice among two alternatives leads to a food reward. In this 
situation, a personal communication from Harlow informs us, 
the monkey demonstrates a preference for one or another of 
the choices offered (e.g., for the red object rather than the 
blue). This preference may serve to increase errors (if, for in¬ 
stance, the square object has been keyed as correct, regardless 
of color). If the monkey is put through one learning series 
after another, in which a different cue differentiates the right 
and wrong choices in each series, the monkey quickly learns to 
learn. His learning curve on later series is strikingly steep. 
“With each successive block of problems the frequencies of 
errors attributable to these factors [one of which is initial pref¬ 
erence or response set] are progressively decreased. . . . The 
process might be conceived of as a learning of response tend¬ 
encies tha counteract the error-producing factors.” 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Harlow has therefore shown that response sets are present 
in the new, ambiguous situation, and that under his conditions 
they are extinguished. In contrast, the test-taking sets of adults 
appear not to be extinguished by usual experiences, even though 
they increase the probability of error. The difference appears 
to be that in Harlow’s experiment there is an immediate frus¬ 
tration attached directly to the wrong (preference-determined) 
response. In school tests the penalty i.s delayed, and is usually 
attached to the total test performance rather than to the 
specifically wrong responses. False approaches to problems, 
such as biases, can be eliminated; sound sets, such as reading 


TABLE 3 

Correlation of Res pome Sett on Varied Tests 


Investigator 

Testa 

Response Set 

Findings 

Lorge (15) 

Bern renter, 

Thu rs tone 
attitude, 

Acquiescence 

Average intcrctirrelation of 
number of Yes's: .14. 


Strong 

Evasiveness 

Average intercorrelation of 
number of Ps or ?'*: ,41, 

Singer-Young 

Two series of 

Tendency to rate 

r’si .J8, .67. 

(ai) 

tones 

"pleasant" 


Two series of 
words 

Tendency to rate 
"pleasant" 

r’s range .44 to .59. 


Two series of 
different stim¬ 
ulus-types 

Tendency to rate 
"pleasant” 

r's range —.34 to .36. 

AAF (7) 

Wrongs score 
on four tests 
of plotting, 
etc. 

Carefulness vs. 
speed 

r's range ,14 to .41. 


each item carefully, can be learned. But direct and immediate 
teaching will be more effective than such incidental punish¬ 
ments as low total scores. 

Generality of Response Sets 

To some degree, a person shows consistent response sets 
from situation to situation. Table 3 summarizes studies bearing 
on this question, When similar situations are presented, re¬ 
sponse set scores are significantly correlated. But there is no 
evidence that response sets are consistent over widely different 
situations, and Singer and Young’s evidence indicates that this 
is not true. But one does not measure response sets alone. Re¬ 
sponse sets show only when the response to a situation is in 



RESPONSE SETS AND TEST DESIGN 15 

some way unclear. Singer and Young point out that habits of 
using their rating scale are operative only when “affective 
arousal is weak or absent.” Perhaps affective arousal is weak 
for one person on tones, for another on odors. This would re¬ 
duce the response-set correlations. 

Response sets might be mere incidental sources of error in 
measurement, or they might reflect deeper personality traits. 
Evidence from many sources now combines to show that re¬ 
sponse sets reflect “real” variables. 

Johnston (13) gave the Bernreuter Inventory and the Hunter 
Attitude Scale to two groups of teachers. These groups were 
chosen on the basis of ratings by their principals, so that one 
group consisted of “autocratic” teachers, and one consisted of 
teachers who were markedly “democratic” in classroom prac¬ 
tice. Johnston found that these groups differed significantly in 
response sets, On the Bernreuter, the autocratic group gave an 
average of 52.6 “Yes,” 62.3 “No,” and 10,8 “?” responses. The 
three totals for the democratic group were 55.9, 66.8, and 4.7 
respectively. There were 42 teachers in the former group, and 
43 in the latter. The difference in “tendency to use question 
marks” (evasion?) was significant (P < .01). There was a sim¬ 
ilar difference on the Hunter scale. The mean number of state¬ 
ments marked “Undecided” rather than “Agree” or “Disagree” 
was 15 in the autocratic group and 10 in the democratic group 
(P < .01). 

Mersman (17), in a small study of vocational interests, com¬ 
pared the Bernreuter responses of college students planning to 
be lawyers, musicians, and engineers. There were seventy-five 
cases in each group. Upon analyzing the number of responses 
of each type in each group, he found the following means: 



Yea 

No 

? 

Lawyers. 

. 53 

62 

10 

Musicians. 

. 56 

58 

11 

Engineers... .. 

. 54 

64 

M 


The differences between engineers and musicians are signifife i v 
(1% level.). . _ l|g 

Evidently groups differentiated on external criteria also d|nH| 
in response sets. Where this is so, part of the response^HI 
variance must represent some real variable. For example, use 





1 6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of question marks may indicate anxiety and evasiveness of 
personality, rather than a transient set alone. Lorge (15) finds 
that the tendency to say "Yes,” "No,” and (estimated 
from several tests) correlates as follows with scores on the 
Flanagan-Bernreuter keys: 

Y*r* No ? 


Confidence.... .27 —.15 —.03 

Sociability...,...00 .27 —.26 


Possible significance of response sets for empirical prediction 
is suggested by a study which finds that tendency to respond 
“?" is correlated negatively with success in selling life insur¬ 
ance (14). While the relationship found was not statistically 
significant, the difference between the mean number of ques¬ 
tion marks in the good and poor groups (8.4 vs. ia.8, CR 1.57) 
is large enough to suggest further investigation along this line. 

Improvement of Test Design 

The heterogeneous bits of evidence pieced together here and 
in our previous report have established several generalizations. 

1. Any objective test form in which the subject marks fixed 
response alternatives (“Yes ,, -“No ) u "True-False/"' a"-" 
etc.) permits the operation of individual differences in response 
sets. The influence of response sets in the multiple-choice test 
is, however, of minor importance. 

1, Response sets have the greatest variance in tests which 
are difficult for the subjects tested, or where the subject is un¬ 
certain how to respond, 

3, Items having the same ostensible content actually measure 
more than one trait, if response sets operate in the test. This is 
true even for tests which, scored as a whole, are "factorially 
pure.” 

4. Slight alterations in directions, or training in test-taking, 
alter markedly the influence of response sets. But if the situa¬ 
tion is not re-structured by the tester, individual differences in 
response set remain somewhat stable when similar tests are 
given at different times, 

5. Response sets are to a small degree correlated with ex¬ 
ternal variables such as attitudes, interests, and personality. 
This shows that they are in part a reflection of “real” and 





RESPONSE SETS AND TEST DESIGN 


17 


stable traits. To this degree, response-set variance may be 
valid variance in some investigations. 

6. Tests are usually constructed to measure a trait defined 
by the content of the test items. If the form of the items per¬ 
mits response sets, two persons having equal true scores on 
the content factor will often receive different scores on the test. 
Response sets therefore ordinarily dilute the test and lower its 
validi ty. 

Paragraphs (5) and (6) crystallize the paradox response sets 
present. Some of the response-set variance is potentially useful, 
some of it is an interference with measurement. The problem 
for the tester is to capitalize on the effect of response sets where 
they are helpful to validity, and to eliminate their influence 
where it is undesirable. It is therefore important to decide 
which view is to be taken in any given situation. The writer 
has attempted to formulate rationally the response-set problem 
in factorial terms. The analysis has been unsuccessful, pri¬ 
marily because response sets do not obey the fundamental 
additive law of factor theory. One cannot define a’person’s test 
score as a weighted addition of his content-factor and response- 
set-factor scores, since response sets have an influence on his 
performance on each item proportional to his doubtfulness. 
That is, the weight for the response-set factor in any item is 
not a constant for all persons, but is a function of each person’s 
score in the content factor. Since the problem is not at present 
formulated analytically in a way which clarifies our thinking, 
we are confined to a general description of the relations. 

Considering only biases such as acquiescence and evasive¬ 
ness, response-set variance may be conceived as containing the 
following elements, combined in some proportion: 

1. Chance variance; resulting from purely random excess of 
choice of one or another alternative. 

2. Internally consistent but momentary response tendencies; 
sets operating throughout one testing, but shifting on a 
retest at another time. 

3. Stable response tendencies; sets operating consistently even 
when the same test is given at different times. 

Evidence of the existence of Type 3 variance has been con¬ 
sistently found whenever investigators have sought it. Evidence 



18 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


for Type ^ variance is lacking, but it may be postulated on the 
grounds that no observed trait is expected to be perfectly 
stable. And of course chance variance is always with us. 

Response-set variance of Type i is not important; it is 
simply another manifestation of error variance, and its influence 
can be reduced by lengthening the test. Variance of Type a is 
unquestionably harmful, unless one happens to be doing re¬ 
search on evanescent sets or moods or some other fluctuating 
variable (for example, a study of mood changes concomitant 
with fatigue). Type ^ variance cannot correlate with stable vari¬ 
ables, and therefore lowers the validity coefficient of the test. 
Moreover, Type z variance is present in many items and prob¬ 
ably increases the coefficient of equivalence (split-half or Kutler- 
Richardson reliability) of the test. Therefore, even if the test 
given on a particular day were lengthened indefinitely, we 
could not raise its empirical validity to t.oo because scores are 
partly saturated with an invalid factor. Type 3 variance is 
potentially useful, but to understand its action wc: must divide 
it between 

3a. Valid variance, the portion of 3 that correlates with the 
criterion the test is intended to predict, and 

3b. Invalid variance, the portion of 3 that does not correlate 
with the criterion. 

We may always expect a portion of Type 3b, since the response 
set could correlate perfectly with the criterion only if the 
criterion is itself a set or a personality trait causing the set. 

Variance of Type 3a does exist, since in some studies the 
response-set score clid correlate with some external variable. 
Moreover, research in a good many fields is turning to per¬ 
sonality variables which may be close cousins to response sets. 
Guilford anticipates that the "carefulness” factor, which is a 
response-set, may prove to have validity as a component of a 
battery for aircrew selection. In studies of prejudice or liberal¬ 
ism, an investigator may find evidence on negativism useful. 
And this is possibly one source of bias toward "No" and "Dis¬ 
agree’ 1 in taking tests. Variance of Type 3b reduces validity, 
and limits the maximum possible validity the test can have 
even if trials on different days are combined. Variance of Type 



RESPONSE SETS AND TEST DESIGN 


19 


3a may increase validity if it is added into the score in one 
way, or it may lower validity if it is added in differently. Thus 
the studies of true-false tests (5) show that students tend to 
say "True” when in doubt, and the duller students, who are 
in doubt most often, say “True” most often. This raises their 
score on true items, lowers it on false items. Hence the poten¬ 
tially valid portion of the response-set variance lowers the dis¬ 
criminating power and validity of true items, and enhances the 
validity of the false items. 

Finally, it should be noted that there is no possibility of 
separating the four types of response-set variance in data from 
a single test; they come entangled in a single performance, and 
we must therefore consider the effect of the response-set vari¬ 
ance as a unit. This total is made up of a random element 
(Type 1), a real but invalid element (Type 1, 3b), and a poten¬ 
tially valid element (3a) which may in practice raise or lower 
the validity of the test score. Of these three categories, only 
3a, the valid variance, is likely to be entirely absent, and the 
size of the correlations of response sets with external variables 
suggests that 3 a is not likely to be the principal component of 
the variance. Therefore: 

a. The probable effect of response-set variance is harmful, 
since elements 2 and 3b are usually present, and these ele¬ 
ments reduce the extent to which the test is saturated with 
the content factor it is supposed to measure. 

b. Even if valid variance is present, its effect may be to lower 
validity of some items or of the total score. But under 
certain circumstances, it may be treated in such a way that 
it raises the validity coefficient. 

c. Only under exceptional circumstances, when a test is de¬ 
signed to study the very personality characteristics which 
are reflected in the response set, does the response set appear 
to be a potentially helpful source of variance. 

Because the operation of response sets upon score is com¬ 
plex, a detailed illustration seems worthwhile. A spelling test 
is planned, using the directions: “ Some of these words are cor¬ 
rectly spelled and some incorrect. Mark every item, + if cor¬ 
rect, o if incorrect.” If the test is intended to indicate whether 
the student will identify errors in his own writing outside of 
school, this form of item has an appealing resemblance to the 



20 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

criterion task. Now suppose we have 6 students. A, B, and C 
know 40 words out of 60, are doubtful on the remainder. D } 
E, and F know 30 words. (This oversimplification of" knowing” 
a word avoids difficulty in this explanation. A) and D have no 
response set. Of the to words, just half are wrongly spelled, 
and when A and D are doubt hi I, they mark just half of the 
unknown words o. B and E are a little umiercritical in non- 
school writing; they fail to notice some errors. But in taking a 
school test, they suspect the teacher of planting errors where 
there are none, and so mark o to per cent of the time when 
they are doubtful. C and F are undcrcritical in all their writing, 
and in taking the test they are also willing to accept errors; 
they mark o only 30 per cent of the time when in doubt. The 
scores then may develop as follows: 

a b c t> e r 

Bias (Proportion of 
+ responses to 

o responses)— 50/50 40/60 70/30 50/50 40/to 70/30 


Words known. 40 40 40 30 30 30 

Guesses correct by 
chance: 

guessed .. 5 4 7 6 io| 

guessed o. 5 6 3 7I 9 4I 

Most probable score. 50 to to at at at 

Maximum possible 
correct guesses; 

guessed +. to 8 10 15 T2 15 

guessed 0. 10 10 6 15 15 9 

Maximum possible 

score. to 58 56 60 57 54 

Minimum possible 
correct guesses; 

guessed +. o o 4 o o 6 

guessed o.. 0200 30 

Minimum possible 

score . 40 4 2 44 30 33 36 


In this, as in other problems, the tendency is for bias to restrict 
the range of scores, not to alter the mean score. Where an un¬ 
biased person may, with lucky guesses, earn a very high score, 
the biased person has a much smaller probability of reaching 
the same total. Bias which reflects " true critical ness” operates 
in the score no differently from bias which is only a special set 
used in taking a test. If the items are divided so that 70 per 











RESPONSE SETS AND TEST DESIGN 1 

cent of the words are correctly spelled, C and F are given a 
advantage, even over A and D. If more than half the spelling 
are incorrect, B and E will tend to earn higher scores tha 
those who know an equal number of words (and are equal c 
the criterion). 

In an unbiased test, where all alternatives have an equ 
weight in the total test, response sets do not add to the variant 
of scores, but have a damping effect, reducing the range < 
points people may earn from a combination of guessing ar 
partial knowledge. If one alternative is present more the 
another, response sets form part of the variance of the te 
scores. 

Methods of eliminating response-set variance .—The writer co 
eludes that as a general principle, the tester should consid 
response sets an enemy to validity. Even when seeking 
measure a trait resembling a response set, one can have co 
fidence in the meaningfulness of the score only after showii 
that variances I, i, and 3b are small in proportion to 3 
Therefore, in most tests and certainly in those not intended 
measure personality, we should keep response sets from affec 
ing the test score by one of the following methods: designi; 
test items which prevent response sets, altering directions 
reduce response sets, or correcting for response sets. 

(a) Test design .—Since response sets are a nuisance, test c 
signers should avoid forms of items which response sets infe 
This means that any form of measurement where the subj< 
is allowed to define the situation for himself in any way is 
be avoided. (We must make an exception for tests where ' 
way of interpreting the test is treated as a significant variab 
But even so, the above analysis suggests limits to the possil 
validity of tests like the Rorschach which capitalize on a 
biguity.) 

Item forms using fixed response-categories are particula 
open to criticism, The attitude-test pattern, where the subj 
marks a statement A, a, U, d, or D, according to his degree 
agreement, is open to the following response sets: Acquiescen 
or tendency to mark “A” and “a” more than “d” and “1 
evasiveness, tendency to mark “U”; and tendency to go to 
tremes, to mark “A” and “D” more than “a” and ! ‘d”. Pr 



22 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ably not all three of these sets will operate to a significant 
degree in any given test, but it is better to eliminate the sets 
at the outset than to spend effort later trying to measure the 
effect of the sets and root them out. Test designers generally 
have argued for retaining the five-point scale of judgment, or 
the more indefinite seven-point, ten-point, or even continuous 
scales. Such scales are open to marked individual differences in 
definition of the reference positions, with the more complex 
scale offering more chance for personal interpretation. The usual 
argument for the more finely divided scale of judgment on each 
attitude item is that it is more reliable and that subjects prefer 
it. If the latter advantage is significant, the finer scale may be 
retained and scored dichotomously. The argument that the 
finer scale gives more reliability is not a sound one, since this 
is precisely what we would expect if all of the added reliable 
variance were response-set variance and had no relation to be¬ 
liefs about the attitude-object in question. There is no merit in 
enhancing test reliability unless validity is enhanced at least 
proportionately. It s an open question whether a finer scale 
of judgment gives either a more valid ranking of subjects ac¬ 
cording to belief, or (what we are beginning to recognize as 
even more important) scores mote saturated with valid variance. 
With raters trained to interpret the scale uniformly, so that 
response-set variance is removed, the finer scale may be ad¬ 
vantageous . 

The writer therefore renews his earlier recommendation that 
the following forms of item be avoided in tests where high 
validity is more important than speed-of-test construction: 
true-false, like-indifferent-dislike, same-different, yes-?-no, 
agree-uncertain-disagree, and mark all correct answers. What 
does this leave? Foremost, it leaves the forced-choice or best- 
answer test. Our attempt to find a response set in the multiple- 
choice test was almost completely unsuccessful. A set was ex¬ 
tracted, and that a set with little reliability, only when the 
test was applied to subjects for whom it was unreasonably 
difficult. Further studies of multiple-choice tests are still in 
order, but experience to date justifies the assumption that they 
are generally free from response sets. One confirmation of the 
argument that forced choices should be used comes from a 



RESPONSE SETS AND TEST DESIGN 


T3 


study by Owens (18). He found that substituting forced-choice 
for the “yes-no” response of the conventional neurotic inven¬ 
tory significantly reduced the number of false positives, i.e., it 
increased empirical validity. The forced choice has long been 
used successfully in many fields, Tests of mental ability now 
use it almost to the exclusion of other forms. Spelling, arith¬ 
metic, and grammar tests can certainly be cast in “recognize 
the right (or wrong) choice” form, rather than checklist forms 
and others open to response sets. Thurstone used it success¬ 
fully in his paired-comparison approach to attitudes, and the 
same approach has long been found satisfactory in psycho¬ 
physics. The Kuder interest test is well known, and ICuder has 
recently developed a new test of personality in the same forced- 
choice form. Paired comparisons may serve well in employee 
rating, and the Army has found the forced-choice valuable in 
obtaining officer ratings. Apparently forced-choice items can 
be used for nearly all purposes now served by the inadequate 
item forms. 

Another important consideration is test difficulty, regardless 
of item form. The influence of response sets rises with difficulty, 
and therefore measurement of differences between students 
who find the test difficult is particularly invalid. This is, first, 
a reason for not using a test on subjects for whom it is quite 
difficult. Second, however, it suggests basing measurement on 
scales of adaptable difficulty. Thus, with the Kuhlmann-Ander- 
son mental-test series, one selects the scales which have a 
difficulty appropriate for the subject, and if the first tests tried 
prove to be too difficult, the tester can move to an easier set 
of items to obtain more accurate measurement. Tests of this 
type, which are common in psychophysics, would be hard to 
use in group measurement; but experimental trial of such test 
designs is worth considering. If the Seashore Pitch Test, for ex¬ 
ample, were redesigned, one might have a preliminary section 
of twenty (?) items, ranging from very hard to very easy. This 
could be scored as soon as completed, and if the score were high, 
the subject would be given a difficult 50-item test (perhaps 
with all differences five cycles or two cycles). But a subject 
who performed near the chance level on tire preliminary test 
would be given a final test of items with large differences (per- 



educational and psychological measurement 


haps 20 to 30 cycles). A set of several overlapping scales would 
be required, all standardized on the same group. Such a test 
could not test large groups inexpensively, but could be quite 
accurate in testing individuals. 

(b) Modification of directions. —If, in any test, we expect a 
particular response set to arise, we can revise the directions to 
reduce the ambiguity of the situation. Another way of accom¬ 
plishing the same end is to give students general training in 
test-wiseness. For example, if they know that in most true- 
false tests about half the items are false, they will tend to 
avoid excessive acquiescence, If they know that the correction 
formula is based on chance, they will know that the odds are 
in their favor when they respond to items where they are un¬ 
certain. 

It appears to the writer that, in most tests, subjects should 
be directed to answer all items, even though this tends to in¬ 
crease the random error variance. In many situations, this 
source of error is less damaging than the constant errors in¬ 
troduced by differences in tendency to guess, checking thresh¬ 
old, or diligence in searching for correct answers. Wesman (25) 
reports partial evidence that grammar items, where the sub¬ 
ject marks each error he notices in given sentences, become 
more reliable when the subject is directed to mark every sen¬ 
tence-part "correct’' or "incorrect,” rather than just checking 
the "incorrects” (but evidence on validity is lacking). 

Whisler (27) raised the question of response-habits in Thur- 
stone-type attitude scales. He found that some subjects marked 
six or more items in a 22-item scale, and for them the reliability 
(parallel-test) of the attitude score was .89. But for the sub¬ 
jects who marked five or fewer items that they agreed with, 
the reliability was .62. Whisler thought that the subjects who 
checked more items were more careful in using the scale, or 
that their attitudes were more integrated. Hancock (8) followed 
Whisler with an experimental alteration of directions. First, he 
directed subjects to mark all the statements they accepted, 
then the five with which they most agreed, and, finally, the three 
of that five which they most strongly accepted. The shift of 
directions produced some alteration in scores. Generally, the 
standard deviation (in scale value) of scores increased when 



RESPONSE SETS AND TEST DESIGN 1$ 

fewer items were counted. For those with attitudes favorable 
to an occupation, the more items they checked, the closer 
their score was to the indifference position. Unfortunately, 
there is not enough evidence in the Hancock report to give a 
basis for selecting any particular number of checks as prefer¬ 
able. If the number of items checked affects mean, sigma, and 
reliability, there can be little justification for permitting the 
number to vary. It appears desirable to require every subject 
to mark a fixed number of alternatives, selecting the state¬ 
ments with which he most agrees. Limited experience with this 
procedure suggests that the subject should check around one- 
fourth of the statements. 

(c) Correction for response sets .—When response sets are 
entering scores on a test, we may control or correct for the 
effect by special scoring keys. One widely used method is the 
control score. If a “response-set score” can be obtained, we 
may identify all cases with extreme response sets and drop such 
cases from the sample, admitting that measurement for them 
is invalid. The most familiar examples appear in the control 
scores of the Minnesota Multiphasic. Many other tests also 
permit us to derive such scores as bias or acquiescence, or 
number of items marked. In some tests it may be acceptable 
to report two scores for every subject; all the essential data 
in the hypothetical spelling test discussed earlier could be re¬ 
ported in one score “number right” and a second “number 
marked as incorrect.” But simultaneous consideration of pat¬ 
terns of scores is awkward. 

Humm has long used the No-Count as a control score on his 
Temperament Scale. A comment in the Supplemental Manual 
for that test is of interest: 

It was observed that subjects whose scores in the Scale were 
at variance with the results of case studies by psychiatrists, 
psychologists, and social workers were found more often among 
those with an ultra-high or an ultra-low proportion of no¬ 
responses, than was the case where no-responses were in the 
middle ranges. Individuals who answer the questions of the 
scale with a high number of no-responses tend, consciously or 
unconsciously, to obscure their real temperaments. On the other 
hand, individuals with a low number of responses may exag¬ 
gerate their temperamental characteristics. 



iS EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Eliminating cases with extreme control-scores has the disad¬ 
vantage of throwing out numerous subjects, but it is vastly 
better than treating the subjects as if the scores were valid. 
Sometimes a simple solution is to readminister the test with 
more careful directions, as Bennett and others illustrate (i). 
But more complex correction procedures are possible. In this, 
Humm and his co-workers were also pioneers. 

Two procedures have been developed for cases where No- 
Counts are extreme. The first is the “profile score." For an 
initial sample of 181 cases, Humm had a criterion score on 
each component the test claimed to measure. The profile score 
is the best estimate of the criterion score from the uncorrected 
score and the No-Count. This procedure, regressing from an 
external criterion rather than merely partialling out No-Count 
in terms of the xero-order r between No-Count and raw com¬ 
ponent score, allows for the very reasonable assumption that 
part of the No-Count variance represents significant elements 
in personality. 

The second correction, reserved only for cases where profile 
scores are inadequately revealing, yields the “regression score.” 
This “stated the standard deviationai distance of the given 
component score from the mode of scores in that component 
attained in scales showing the same No-Count. The regression 
score takes no account of validity. It does not, therefore, con¬ 
sider how well the Component Score measures tire ‘ true’ com¬ 
ponent strength.” This, of course, partials out all the response- 
set portion of the score variance. 

Humm and Humm (ia) report that their procedures raise 
the validity of interpretations, for those papers where correc¬ 
tion is required. Similar methods could no doubt be applied to 
other tests, and in the K-correction of the Multiphasic, a sim¬ 
ilar treatment is illustrated. Such refined statistical improve¬ 
ments are worth making only when one intends to treat a test 
quite seriously. It would scarcely be worthwhile to build a cor¬ 
rection score for acquiescence into the Bernreuter test, in view 
of the many other bases for doubting its validity. But where 
great statistical labor in the form of factor analysis has already 
entered such a test as Guilford’s series, application of a con¬ 
trol score for response sets may be worth serious considera¬ 
tion, 



RESPONSE SETS AND TEST DESIGN 


27 


Correction for response sets is a problem in suppressor vari¬ 
ables (10, pp. 140-142). We wish to retain valid response-set 
variance (Type 3a), but we wish to remove from the score the 
variance of Type 3b and 1. If an independent estimate of the 
Type 3a variance, or of the combined undesirable variance, 
could be obtained by a pure measure of the response set itself, 
this estimate might be used as a suppressor variable. 

Capitalizing on response-set variance .—If response sets are 
thought of as possibly contributing to validity, one may weight 
the response sets in a way that maximizes their contribution. 
Cook and Leeds (3) correlate each possible response on an 
attitude scale for teachers with a criterion, and assign positive 
or negative scoring weights accordingly. One item is as follows, 
where the numbers in parentheses are weights: 

12 3 4 5 

It is some- Strongly Agree Un- Disagree Strongly 

times neces- agree decided disagree 

sary to break (o) (4) (—1) (4) ( — 1) 

promises to 
children. 

The criterion used was a dependable estimate of the ability of 
teachers to establish rapport with children, which the scale 
was supposed to predict. It will be noted that the scoring 
weights are "illogical,” since there can be no stronger response 
to “It is sometimes necessary. . . than to disagree (response 
4), which amounts to saying "It is never necessary,” The 
weights for responses 4 and 5 reflect the difference in response 
set (not in logically considered opinion) between teachers in 
the superior and inferior criterion groups. The defense of the 
Cook-Leeds procedure, and the comparable method used in 
Strong’s Interest Blank, is that it yields considerable validity. 
The limitation is that invalid variance (Types 2 and 3b) is 
weighted just like valid variance. A particular "good” teacher 
who has a set to respond very emphatically will be penalized 
by the weights. The majority of “good” teachers, who avoid 
extreme responses, will be reliably discriminated by the key. 
One difficulty with the sheer empiricism represented here is 
that the weights serve their practical purpose but give little in¬ 
sight into the nature of the variables tested. The only basis for 
extending or improving the test is trial-and-error, developing 



2.8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

many more items of ail sorts and trying them to see how the 
weights come out. 

Sometimes, instead of employing correction scores to refine 
the total test score, one may modify the original test scores. 
Thus Flanagan (2.3, p. 9) suggests scoring Rights and Wrongs 
separately, and using each score in the multiple-correlation 
when trying to predict a criterion. This procedure permits one 
to weight "carefulness” variance separately from "ability” vari¬ 
ance. Work with true-false tests suggests that scores Rights-on- 
True-Items and Rights-on-False-Items will have different valid¬ 
ity and may be assigned different weights in the predictor 
score (5). Probably this notion could be extended further, in 
empirical prediction. 

Summary 

This paper summarizes extensive evidence demonstrating 
that such response sets as bias in favor of a particular alterna¬ 
tive, tendency to guess, working for speed rather than accuracy, 
and the like, operate in conventional objective tests. Not only 
are such sets widespread, but they reduce the validity of test 
scores. The response set can be altered readily by alteration of 
the directions or by coaching. Some studies show that response 
sets are somewhat correlated from one test to another (but not 
if the tests differ greatly in content), and that they are corre¬ 
lated with important external variables. While response-set 
variance may under certain circumstances enhance logical and 
empirical validity, it appears that its general effect is to reduce 
the saturation of the test and to limit its possible validity. 

The following recommendations for practice, most of which 
were previously suggested, are reinforced by the present find¬ 
ings: 

1. Response sets should be avoided with the occasional ex¬ 
ception of some tests measuring carefulness or other personality 
traits which are psychologically similar to response sets. 

1. The forced-choice, paired-comparison, or "do-guess” mul¬ 
tiple-choice test should be given preference over other forms of 
test item. 

3. When a form of item is used in which response sets are 
possible, 



RESPONSE SETS AND TEST DESIGN 


29 


a) Directions should be worded so as to reduce ambiguity 
and to force every student to respond with the same 
set. 

b) The test should not be given to a group of students for 
whom it is quite difficult. 

c) A response-set score should be obtained) and used to 
identify subjects whose scores are probably invalid. 

4. Where response sets are present, attempts should be made 
to correct for or to capitalize on the response set by an appro¬ 
priate empirical procedure. 

In view of the overwhelming evidence that many common 
item forms invite response sets, and in view of the probability 
that these sets interfere with accurate measurement, it will 
rarely be wise to build new tests around item forms such as 
A-U-D, Yes-No-?, and “check ail correct answers.” It is to be 
hoped that the tests forthcoming in the future will be designed 
to increase their saturation with the factors the test is seeking 
to measure. 


REFERENCES 

1. Bennett, George K., Seashore, Harold G. and Wesman, Alexander 
G. Differential Aptitude Tests, Manual. New York: Psy¬ 
chological Corporation, 1947. 

1 . Brotherton, D. A., Read, J. M. and Pratt, K. C. “Indeterminate 
Number Concepts: II. Application by Children to Deter¬ 
minate Number Groups.” 'Journal of Genetic Psychology, 
LXXIII (1948), 209-236. 

3. Cook, Walter, W. and Leeds, Carroll H. “Measuring Teacher 

Personality.” Educational and Psychological Meas¬ 
urement, YII (1947), 399-410. 

4. Cronbach, L. J. “Response Sets and Test Validity.” Educational 

and Psychological Measurement, VI (1946), 475-494. 

5. Cronbach, L. J. “Studies of Acquiescence as a Factor in the 

True-False Test.” Journal of Educational Psychology 
XXXIII (1942),401-415. 

6. Goodfellow, Louis D. “The Human Element in Probability.” 

Journal of General Psychology , XXXIII (1940), 201-205. 

7. Guilford, T. P. (ed.) Printed Classification Tests. AAF Aviation 

Psychology Program Research Reports, No, 5. Washington, 
D. C.: Government Printing Office, 1947. 

8 . Hancock, John W. “An Experimental Study of Limiting Response 

on Attitude Scales.” In H, H. Remmers (ed.), Further 
Studies in Attitudes , Series III. Studies in Higher Education , 
XXXIV. Lafayette, Ind.: Purdue University, 1938. Pp, 
142-148. 



3 ° 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


9. Harlow, H. F. “The Formation of Learning Sets." Psychological 
Review, LVI (1949), 5 1 5 ■ 

10. Horst, Paul. The Prediction of Personal Adjustment. New York: 

Social Science Research Council, 1941. 

11. Humm, Doncaster, G. and Wadsworth, Guy, Jr. The Iriterpreta - 

tion of the Humm-Wadsworth Temperament Scale. Los Ange¬ 
les: D, G, Humm, 1943. 

12. Humm, Doncaster G. and Humm, Kathryn A. “Compensations 

for Subjects’ Response-Bias in a Measure of Temperament.” 
American Psychologist, II (1947), 3 ° 5 - 

13. Johnston, Aaron Montgomery. “The Relationship of Various 

Factors to Autocratic and Democratic Classroom Practices,” 
Unpublished doctoral dissertation, University of Chicago, 
1948. 

14. Kahn,D.F. and Hadley, J, M. “Factors Related to Life Insurance 

Selling.” Journal of Applied Psychology, XXXIII (1949), 
131-140. 

15. Lorge, I. “Gen-like: Halo or Reality?" Psychological Bulletin, 

XXXIV (1937), 545-546. 

16. Mathews, C. O. “The Effect of the Order of Printed Response on 

an Interest Questionnaire," Journal oj Educational Psychol¬ 
ogy, XX (1919), 118-134. 

17. Mersman, Ivo. “Personality Traits as Related to Vocational 

Choice." Unpublished masters' thesis, University of Chicago, 
1948. 

18. Owens, W. A. “Item Form and 'False-Positive' Responses on a 

Neurotic Inventory." Journal of Clinical Psychology, III 
(1947), 264-169. _ 

19. Philip, B. R. “Generalization and Central Tendency in the Dis¬ 

crimination of a Series of Stimuli.” Canadian Journal of 
Psychology, I (1947), 196-204. 

20. Rubin, Harry K. “A Constant Error in the Seashore Test of 

Pitch Discrimination." Unpublished masters’ thesis, Uni¬ 
versity of Wisconsin, 1940. 

II. Singer, William B. and Young, Paul T. “Studies in Affective 
Reaction: III. The Specificity of Affective Reactions." Jour¬ 
nal of General Psychology, XXIV (1941), 327-34I. 

22. Thorndike, R. L. “Critical Note on the Pressey Interest-Attitudes 

Test." Journal of Applied Psychology, XXII (1938), 657- 
658. 

23. Vaughn, K. W. (ed.) “National Projects in Educational Measure¬ 

ment.” American Council on Education Studies, Series I, 
No. 28, (1947), pp, 8-12.^ 

24. Vernon, P. E. “Classifying High Grade Occupational Interests,” 

Journal of Abnormal and Social Psychology, XLIV, (1949), 

85-96. 

25. Wesman, Alexander, G. “Active versus Blank Responses to 

Multiple-Choice Items.” Journal of Educational Psychology, 
XXXVIII (1947), 89-95, 



RESPONSE SETS AND TEST DESIGN 


3 * 


26. Wesman, Alexander G. “The Usefulness of Correctly Spelled 

Words in a Spelling Test.” Journal of Educational Psychology, 
XXXVII (1947), 242-246. " ' 

27. Whisler, L. D. ‘“Reliability’ of Scores on Attitude Scales as 

Related to Scoring Method.” in H. H. Remmers, (ed.), 
Further Studies in Attitudes , Series III. Studies in Higher 
Education, XXXIV. Lafayette, Ind.: Purdue University, 
1938. Pp. 126-129. 

28. Wyatt, Ruth F. “Improvability of Pitch .Discrimination.” Psy¬ 

chological Monographs, LVIII (1945), No, 267. 



CLIENT ACCEPTANCE OF SELF-INFORMATION 
IN COUNSELING 


ROBERT B. KAMM 
Drake University 
and 

C. GILBERT WRENN 
University of Minnesota 

The counseling process is generally recognized today as a 
professional psychological function. Writers on the subject 
agree that the counseling experience is a dynamic relation¬ 
ship between two people—an ever-changing relationship to 
which many variables contribute, This concept has emerged 
as a result of three somewhat varied, yet related, types of re¬ 
search on the counseling process: studies of evaluation, studies 
of counseling methodology, and studies of factors operative 
within the counseling interview. The present research study is 
classified in the last group in that it is a consideration of fac¬ 
tors at work within the interview situation, 

Research has shown that certain students seem to benefit 
from the counseling process. On the other hand, other students 
apparently do not benefit from this experience. Some students 
appear to follow counselor suggestions and to accept informa¬ 
tion more readily than do others. The question arises: Within 
which interview situations do clients tend to accept informa¬ 
tion presented, and within which do they tend not to accept 
the data? Further, in what ways, if any, do students who 
accept the information differ from those who do not? Also, 
what types of information tend to be accepted, and what types 
tend not to be? 

In the present study “acceptance” is defined as favorable 
reception by the client of information presented to him, as 
demonstrated by (a) what the client says and (b) what the 
client does. “Information” includes all data presented by the 
counselor, whether they be in the form of advice, suggestion, 

3a 



CLIENT ACCEPTANCE OF SELF-INFORMATION 


33 


emphasis, recommendation, interpretation, request or explana¬ 
tion. The type of interview in the present study is limited to 
educational-vocational planning interviews. 

Methodology of Study 

Utilizing one trained, experienced counselor, complete pho¬ 
nographic recordings were made of forty educational-vocational 
planning interviews. The clients used were University of Min¬ 
nesota first-quarter General College freshman men who volun¬ 
tarily sought counseling. They were typical of the General 
College population with regard to academic ability and voca¬ 
tional interests. 

Just prior to the actual recording, each of the clients com¬ 
pleted an “immediate pre-interview” form of inquiry pertain¬ 
ing to his educational and vocational plans. Within several 
days following the recorded interview, an interview during which 
the client’s academic ability, interest, and aptitudes had been 
discussed with him, with suggestions and recommendations 
made by the counselor, each client completed an “immediate 
post-interview” form of inquiry. In addition, the counselor 
after each interview indicated on a check list his judgment 
with regard to the emotional states of the client and counselor 
and the degree of rapport achieved in the interview. 

At one month and again at four months after the recorded 
interview, the investigator interviewed each of the clients in 
an effort to gain additional evidence for and against acceptance 
on the part of each client. Following this, all of the pre-inter¬ 
view and post-interview data were summarized for each case 
and presented to a team of three judges who, working inde¬ 
pendently, decided in which cases “acceptance” had occurred 
or had not occurred. A composite of the judges’ decisions was 
made in order to categorize the cases as acceptance or non- 
acceptance. 

In the meantime written transcriptions had been made of 
each of the forty recorded interviews. Following this, each of 
the client and counselor responses, numbering 12,238, were 
categorized into one of twenty-two categories. 

For the classification of counselor responses, Seeman’s nine 
categories were used (4). These are: (a) counselor questions 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


dealing with content or factual data; (b) counselor questions 
concerned with the attitudes and motivations of the client; 
(c) counselor responses to content; (cl) counselor responses to 
feelings, attitudes and motivations of the client; (e) counselor 
interpretations and opinions concerning content; (f) counselor 
interpretations and opinions concerning client attitudes, feel¬ 
ings and motivations; (g) suggestions, advice and counselor 
decisions on courses of action for the client; (h) suggestions, 
advice and counselor decisions concerning client attitudes and 
feelings; (i) information given by the counselor. 

In addition to Seeman’s nine categories, two additional coun¬ 
selor-response categories were used in the present study. These 
were: (a) unclassified and (b) simple agreement: (“Yes,” "uh- 
huh”). 

In the categorization of client responses, Snyder’s eight gen¬ 
eral categories for client content and three general categories 
for client-feeling responses were employed in the study (5). 
The client-content categories are: (a) problem; (b) asking for 
information; (c) disagreements; (d) answering questions; (e) 
agreement; (f) insight; (g) planning; and (h) miscellaneous. 

The client-feeling categories include: (a) positive attitudes 
(statements which reveal approval and acceptance of the client 
himself, the counselor or the counseling process or other per¬ 
sons, objects or situations); (b) negative attitudes (statements 
which reveal disapproval, or rejection of the client himself, 
the counselor or the counseling process, or other persons, ob¬ 
jects or situations); (c) ambivalent attitudes. 

Pertinent personal data such as previous work experience, 
education and home background, as well as academic-aptitude 
test scores and interest and personality inventory results, were 
also gathered for each of the forty cases. 

Findings of the Study 

The composite rating of the judges showed that in twenty- 
six of the forty cases the clients either “definitely” or “for the 
most part” accepted information presented in the interview. 
The other fourteen cases were divided among the “indecisive” 
cases and the “definitely” and “for the most part” non-accept¬ 
ance cases. The heavy weighting of acceptance cases may be 



CLIENT ACCEPTANCE OF SELF-INFORMATION 35 

attributed in part to the fact that the clients came voluntarily 
for help with their problems. 

The most important findings of this investigation pertinent 
to the dynamics of acceptance are the following: 

i. Client acceptance of information presented occurs most 
often in those situations in which both client and counselor 
are completely relaxed. When either of the two, or both, 
are not relaxed, acceptance is less likely to occur. 

1. Acceptance is directly related to “positive attitude ” as ex¬ 
pressed by clients during the interview. Acceptance, on the 
other hand, is inversely related to both negative and 
ambivalent attitudes, as expressed by clients during the 
interview. 

3. Acceptance is directly related to a “readiness” for coun¬ 
seling help. Merely having a “felt need” on the part of 
the client does not necessarily mean that acceptance of 
information, pertaining to that need, will occur. A readi¬ 
ness to act with regard to a felt need appears to be the cru¬ 
cial factor with regard to acceptance. 

4. Information which is directly related to the client's own 
immediate problem tends to be accepted. 

5. Information which is not in opposition to client self-concept 
tends to be accepted. Further, information which shows 
the client to be like others of his group tends to be ac¬ 
cepted whereas information which shows him to be devi¬ 
ate tends not to be accepted. 

Less crucial findings of the study are: 

1. The counselor used in the present study did not differ 
significantly in his counseling approach for the accept¬ 
ance and the non-acceptance groups. It was found that, 
as the interview progressed, he (a) asked fewer ques¬ 
tions, (b) gave more suggestions and directions, and (c) 
showed less simple agreement. He showed an increase 
in information-giving from the initial one-third of the 
interview to the middle one-third and then a decrease 
from the middle to the final one-third. He made little 
use of feeling-responses. 

1. Although the counselor’s approach in the interview situ¬ 
ation does not vary significantly in the present study. 



36 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

some clients accept information presented, whereas 
others do not. This suggests the operation of factors 
other than the counselor's approach in the determination 
of client acceptance. 

3. Both acceptance and non-acceptance of information can 
occur in situations in which the client-counselor rela¬ 
tionship is friendly. Also, when an apathetic relationship 
is experienced, either acceptance or non-acceptance can 
occur. 

4. There appears to be a positive relationship between non- 
acceptance and the achievement of only a "surface” 
understanding of the problem by the client and the 
counselor, as indicated by the counselor’s rating of the 
interview. 

5. Acceptance does not appear to be related to client use of 
such categorized responses during the interview as (a) 
statement of the problem, (b) answering of counselor 
questions, (c) indications of insight gained, (d) indica¬ 
tions of plans, and, (e) unrelated client discussion. The 
data suggest (although the findings between acceptance 
and non-acceptance groups are not statistically signifi¬ 
cant) that acceptance may be related to client agree¬ 
ment and inversely related to client disagreement, as 
shown by client responses during the interview. Accep¬ 
tance may also be inversely related to the asking of 
factual questions during the interview. 

6. For both the acceptance and the non-acceptance cases 
as the interviews progressed, there was (a) a decrease in 
client statement of the problem, (b) an increase, fol¬ 
lowed by a tapering off, in the asking of questions by 
the client, (c) a decrease in client answering of coun¬ 
selor questions, (d) an increase in client agreement, and 
(f) an increase in client statements pertaining to plans. 
The non-acceptance group showed an increase in un¬ 
related statements, whereas the acceptance group was 
constant with regard to unrelated data. 

7- The non-acceptance cases, like the acceptance cases, 
showed an increase in the expression of positive feelings 
as the interview progressed. The level of expression of 



CLIENT ACCEPTANCE OF SELF-INFORMATION 


37 


positive feelings, however, was significantly lower 
throughout the interview for the non-acceptance group. 
The two groups likewise showed parallel patterns of 
decrease of negativism and ambivalence. 

8. Acceptance appears to be unrelated to the factors of 
(a) the length of the interview, (b) the time of the day 
of the interview, and (c) the proportion of time which 
the client speaks during the course of the interview. 

9. Acceptance is unrelated to (a) academic aptitude, (b) par¬ 
ticular measured personality patterns, (c) social status 
of the client’s home, (d) veteran status, (e) marital 
status, (f) part-time work status while in college or (g) 
the factor of previous client-counseling contacts. 

10. For those judged to have “definitely” accepted informa¬ 
tion presented, there appears to be a direct relationship 
between acceptance and good first-quarter academic 
achievement. 

11. With regard to vocational interest patterns, acceptance 
appears to be related to the presence of interest profiles 
which contain all three types of interest patterns: pri¬ 
mary, secondary, and tertiary. Except for this finding, 
acceptance is unrelated to any particular vocational in¬ 
terest pattern. 

12. Different kinds of information are accepted equally well 
by the acceptance and non-acceptance groups, with one 
possible exception. Information which involves an alter¬ 
ing of previously made client plans tends to be more 
often accepted by the group defined as the “acceptance” 
group. 

Conclusions and Implications for Counseling 

Conclusions obtained from the present findings and implica¬ 
tions for counseling follow: 

1. The importance of certain psychological factors in the 
acceptance of information has been noted. T'he most conclusive 
of all the findings, perhaps , is that acceptance is related to client 
feeling , particularly feeling or attitude toward self. The im¬ 
portance of an emotionally relaxed client-counselor relation¬ 
ship has been shown, The factor of “readiness” has been indi- 



EDUCATJONAL AND PYSCHOLOG1CAL MEASUREMENT 


cated. Further, it has been pointed out that information which 
is directly related to the client’s own immediate needs is likely 
to be accepted, as is information which does not oppose or in¬ 
jure the client self-concept, 

The counselor must recognize the presence of positive, nega¬ 
tive, and ambivalent attitudes of the client. If the client shows 
a predominance of negativism and/or ambivalence, it may be 
necessary that the counselor structure the counseling process 
in such a manner that there would be a series of "preparation 
for educational-vocational planning" contacts, devoted to the 
development of proper client sets and attitudes. Once this is 
done, acceptance of information pertaining to educational-vo¬ 
cational planning might take place more readily. 

On the other hand, if the client demonstrates a warmth 
toward the interview, toward the counselor, toward himself, 
as well as toward others, the planning interview can proceed 
and the counselor may feel reasonably certain, other factors 
being equal, that acceptance of information will occur. 

The finding that there is an increase in positive expression 
and decreases in negativism and ambivalence for the non- 
acceptance cases, as the interview progresses, poses an in¬ 
teresting problem. In the first place, this finding should be 
indicative to the counselor that all clients who “warm up” 
during the interview will not necessarily accept. More impor¬ 
tant, however, this demonstrated rise in positive feelings may 
be interpreted as an encouraging sign 1 —a sign that may be 
indicative of acceptance at some later time, providing the 
client is given proper orientation and preparation for the edu¬ 
cational-vocational planning session. 

On the other hand this warming-up may be merely an ex¬ 
pression of a pleasant social convention. In our culture all of 
us are taught to be as agreeable as possible, to put our best 
social face forward. Hathaway (3) has called this the "hello- 
goodbye" convention, this tendency to be pleasant and to 
express'formal gratitude at the end of the interview. He warns 
against utilizing such expressions of goodwill in the interview 
as measures of the effectiveness of the interview. Hence this 
rise in positive feelings may be only a measure of the social 
graciousness of the client. 



CLIENT ACCEPTANCE OP SELF-INFORMATION 


39 


Closely allied is the factor of "readiness.” Negativism and 
ambivalence may actually be indicative of a lack of readiness 
in some cases. The counselor must determine whether or not 
the client is ready for educational-vocational planning. If he 
is not ready, perhaps there will need to be "preparation for 
planning” contacts, as previously mentioned. In this connec¬ 
tion one should not forget Butler’s differentiation between the 
adjustment and distributive phases of counseling (a). He con¬ 
tends that the distributive phase (ICefauver’s term, the use of 
"planning phase” might be even more appropriate) should 
not be entered upon until adjustment to the present and to 
himself has been assured. Thus the "preparation for planning” 
spoken of here may mean adjustment counseling using permis¬ 
sive methods of treatment. The “readiness to act” mentioned 
earlier may merely mean a lack of preoccupation with areas 
of self-regard other than those associated with the planning 
at hand. 

The establishment of an emotionally relaxed relationship 
between client and counselor is, apparently, necessary before 
information will be accepted. The importance of good rapport 
in the interview relationship has long been recognized. The 
importance of an emotionally relaxed state as a contributor 
to good rapport is specifically noted here. The counselor is 
obligated to establish, insofar as possible, such a relationship. 

The counselor must recognize what needs are most immedi¬ 
ate and most pertinent to the client. 'That which the client recog¬ 
nizes as real, not what the counselor sees , is most important. Ac¬ 
cordingly, if acceptance is to occur, it is necessary to start at 
the level where the client operates at the moment. The coun¬ 
selor must use techniques designed to assist in the development 
of the client to a more realistic awareness of himself. Here is 
introduced again the factor of readiness or of need for self¬ 
adjustment, showing how the factors related to acceptance are 
not discrete but intertwined. 

The importance of starting at the level of thinking of the 
client is further given support by the present finding that 
information not in line with previous client plans tends to be 
rejected. The counselor must be aware of these previous plans, 
goals and objectives of the client. It is necessary for him to 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


recognize them and to take them into consideration if accept¬ 
ance is to occur, 

i. The lack of relationship between acceptance and the con¬ 
tent of client response during the interview serves in a sense 
to point up and to give emphasis to the importance of client 
feeling. It appears that the counselor will do well to pay less 
attention to the content of what the client says and more to 
how the client feels. 

3. With regard to acceptance, such traditionally stressed 
factors as academic aptitude, personality patterns, vocational 
interest patterns, home background and previous counseling 
contacts show little or no relationship to acceptance. These 
data, useful as they are in some situations, do not seem to be 
crucial insofar as acceptance is concerned. The acceptance of 
information presented apparently can occur in spite of low 
academic ability or a poor home background. Likewise, such 
factors as the length of the interview, the time of day of the 
interview, and the proportion of time in which the client speaks 
during the interview may not deserve the attention they are 
sometimes given, at least as far as acceptance is concerned. 

4. The client with his needs, his wants and desires, his atti¬ 
tudes and feelings is the basic determiner of whether or not 
acceptance occurs. The data suggest that the client himself is 
more important than the interview situation itself or the type 
of information presented. 

5. Certain individuals may benefit little, if any, from a par¬ 
ticular counseling contact. In the present study with a group 
of college freshmen (typical of General College freshmen in 
general), there appears to be little reason for admitting that 
many of these students would never accept information pre¬ 
sented during an educational-vocational planning interview. 
The evidence seems to indicate that all of the clients in the 
present group, with further attention, might develop to a state 
of acceptance. The possibility needs to be explored that there 
would be a greater acceptance of test information if test selec¬ 
tion were made by the clients in the manner suggested by 
Bordin and Bixler (1). Theoretically such client-chosen tests 
would be in personality areas where there is adequate “readi¬ 
ness” for acceptance of results, Any attempt to assist the client 



CLIENT ACCEPTANCE OF SELF-INFORMATION 4I 

toward a realistic self-acceptance must always start at the level 
of the client and must be developed at a pace agreeable to the 
client. Such a technique assumes that the counselor has suffi¬ 
cient insight to recognize underlying client problems and to 
see them in the total picture of the client’s existence. 

A Proposal For Further Research 

The present study might be regarded as a “pilot study,” 
inasmuch as it was limited in scope and was a pioneering ven¬ 
ture with regard to the methodology of the problem. For prac¬ 
tical considerations the size of the sample was limited. In 
order to permit generalization from the small sample, 
the sample was restricted to one stratum of the college popula¬ 
tion, thereby securing a more homogeneous group. To limit the 
variables operative within the interview situation, only one 
counselor was used. These limitations were deemed necessary 
for the present study. A similar study should be carried out in 
which the following conditions might be observed. 

1. A larger sample, representative of the total college popula¬ 
tion, should be utilized. 

2. Several trained counselors should be employed to do the 
counseling. The counselors used should be known to vary 
from the more “non-directive” to the “directive” approach 
with regard to counseling philosophy and methodology. 
They might include those avowedly “eclectic” or those 
psychoanalytic in orientation. 

3. Recorded data should include the entire series of contacts 
with each case rather than only one interview. 

4. Problems not only pertaining to educational-vocational 
planning needs, but to other problem areas as well, should 
be studied. 

5. Careful investigation of pre-interview behavior and rigorous 
observation of post-interview behavior should be done. 

If the above suggestions were followed, a pool of data would 
be available which' would provide many answers concerning 
the dynamics of the counseling process. Such a pool would 
provide data for any degree of intensity or extensiveness that 
would be desired. Acceptance of data pertaining to emotional 
and personality problems might be investigated. Detailed an¬ 
alysis of client sets which are brought to the interview could 
be made. Likewise, such other psychological aspects of the 



42 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

interview as client motivation and the interaction of two per¬ 
sonalities participating in the interviews could be studied. Stud¬ 
ies of counselor methodology and careful analysis of segments 
of the interview could be made. The agency or institution 
which is willing to provide sufficient backing for such an enter¬ 
prise will step into a position of leadership in counseling re¬ 
search. 


REFERENCES 

1. Bordin, Edward S. and Bixler, Ray H. “Test Selection: A Process 

of Counseling.” Educational and Psychological Meas¬ 
urement, VI (1946), 361-373. 

2. Butler, John M. “On the Role of Directive and Non-Directive 

Techniques in the Counseling Process,” Educational and 
Psychological Measurement, VIII (1948), 201-209. 

3. Hathaway, Starke R. “Some Considerations Relative to Non- 

Directive Counseling as Therapy.” Journal of Clinical Psy¬ 
chology , III (1948), 226-231. 

4. Seeman, Julius, “A Study of Preliminary Interview Methods in 

Vocational Counseling and Client Reactions to Counseling,” 
Unpublished Ph.D. thesis, University of Minnesota, 1948, 

5. Snyder, William U. “An Investigation of the Nature of Non- 

Directive Psychotherapy.” Journal of General Psychology , 
XXXIII (1945), 193-223. 



THE CONCEPTS OF RELIABILITY 
AND EIOMOGENEITY 

C. H. COOMBS 1 
University of Michigan 

I. Introduction 

The literature of test theory is replete with articles on the 
computation and interpretation of indices of reliability. In 
them one finds surprisingly little common agreement or even 
mutual understanding (6). In more recent years the concept of 
homogeneity, with its indices, has been added, with the result 
that the confusion has increased. We shall make no effort in 
this paper to review and summarize this literature but shall 
attempt to do three things: 

(1) point out what we regard as the fundamental sources of 
this confusion; 

(2) provide a theoretical foundation on the basis of which 
this confusion might be resolved; 

(3) point out the further steps that must be taken to develop 
the theory and practice of mental testing. 

II. Sources of Present Confusion 

There are two fundamental sources 2 of confusion in present 
test theory: one is the assumptions by means of which we arrive 
at an interval scale (3), and the second is the identification of 


1 This paper is an extension to the area of mental testing of some of the ideas con¬ 
tained in a chapter in a general theory of psychological scaling developed in 1948-1949 
under the auspices of the Rand Corporation and while in residence in the Department 
and the Laboratory of Social Relations, Harvard University, While the author carries 
the responsibility for the ideas contained herein, their development would not have 
been possible without the criticism and stimulation of Samuel A, Stouffer, C. Frederick 
Mosteller, Paul Lazarsfeld, and Benjamin W, White in a joint seminar during that 
year, Development of the theory before and after the sojourn at Harvard was made 
possible by the support of the Bureau of Psychological Services, Institute for Human 
Adjustment, Horace H. Rackham School of Graduate Studies, University of Michigan, 
A version of these ideas was presented in a 1949 APA symposium on Test Homogeneity 
and Test Validity, 

2 A Complete discussion of the fundamental difficulties in present test theory is to 
be found in Thomas (5). 


43 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


our statistical indices witli the concepts they are presumed to 
measure. These two basic difficulties are intimately related and 
are both associated with our attempt to model psychological 
measurement on physical measurement. Let us discuss them 
briefly, in turn, 

Consider the manner in which data are obtained in the area 
of mental testing: The method used is the method of single 
stimuli, in which there is one response from each individual to 
each stimulus. These responses comprise our basic data, and 
consist of two piles of items for each individual. One pile has 
the items which the individual passed and the other pile those 
items which he failed. Note that there is no information in 
the data for a given individual pertaining to (i) how well he 
passed one item compared with another, or (2) how badly he 
failed one item compared with another, or (3) finally, how badly 
he failed one item compared with how well he passed another. 
The only way to obtain metric relations in data collected by the 
method of single stimuli is to put the information in the data by 
means of a priori statistical assumptions concerning, for exam¬ 
ple, the shape of the distribution function of the abilities of 
the individuals on the attribute in question. A normal distri¬ 
bution is usually what is assumed in test theory but even this 
is not applied in a thoroughgoing fashion. 

To carry out the assumption fully (1) the percentage passing 
each item should be corrected for chance, then (2) converted 
to a sigma score, and (3) items at equal intervals on this sigma 
scale should be selected for a final form. This procedure is 
usually not rigorously adhered to because, in the first place, it 
makes little practical difference, in many instances, if the items 
are not precisely distributed in a discrete rectangular distribu¬ 
tion on this sigma scale. But there is another reason why it is 
not insisted that this procedure should be rigorously adhered 
to, and that is because the assumptions which lead to a unit 
of measurement implicitly require the further assumption of 
perfect homogeneity. The distrust of the procedure is supported 
by the fact that the assumption of perfect homogeneity can 
usually, if not always, be shown to be violated, even in such 
crude data as that collected by the method of single stimuli. 
Unfortunately, to many this is simply regarded as one of the 



CONCEPTS OF RELIABILITY AND HOMOGENEITY 


45 


sources of error variance and not as a fundamental theoretical 
obstruction. 

Thus, in the method of single stimuli as applied to mental 
testing we create an interval scale without any built-in or in¬ 
herent test of its validity. Having such a scale, then, it is per¬ 
missible to use certain properties of numbers, and we have 
available a variety of statistical procedures for the analysis 
of behavior. We must, of course, allow for error variance, much 
of which we have put there ourselves in assuming an interval 
scale, and, consequently, a statistical theory of error becomes 
necessary and plays a dominant role in test theory. This, then, 
is one major source of difficulty in the area of tests and measure¬ 
ments but, important as it is, it is not as fundamental as the 
second source. The difficulty arising from assumptions lead¬ 
ing to an interval scale is of significance primarily to the em¬ 
pirical aspect of psychological testing rather than to the theo¬ 
retical aspect. 

The second source of difficulty, which we consider to be of 
prime theoretical significance, has, however, arisen from the 
use of an interval scale. Basically, this second source of confu¬ 
sion is the fact that we have had no fundamental -psychological 
rationale underlying our concepts in test theory. Rather, we 
find an easy road to the concepts of test score, difficulty of an 
item, reliability and homogeneity via statistical definitions of 
indices dependent upon the existence of an interval scale. We 
set up these statistical indices based on operational procedures, 
then give names to them and act as if they have certain obvious 
psychological meanings. We have gained readily obtainable 
empirical indices but have paid for them in psychological am¬ 
biguity and imprecise meanings and interpretations. While rela¬ 
tively easy to compute and apparently readily susceptible to 
empirical study, an invalid assumption of an interval scale 
would vitiate even their numerical precision. Thus, we have 
not one but many indices of reliability, each determined in 
a different way, and hence each implying a different meaning. 
We do not have, independently, a quantitative definition of 
the concept of reliability, psychologically derived, with a unique 
interpretation. We have a variety of meanings for the concept 
of reliability, depending upon tire index used. It is our thesis 



46 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

that the concept of reliability should have a unique psycholog¬ 
ical meaning quantitatively defined, and the various indices 
should then be regarded as different kinds of approximations 
to the concept. The challenge, then, would be to the experi¬ 
menter to devise indices which are better measures of the con¬ 
cept. 

III. A Psychological Rationale for the Concepts of Reliability 

and Homogeneity 

Phe Fundamental Equation. —We shall now attempt to sketch 
a theoretical psychological foundation for the derivation of 
quantitative definitions of certain concepts of test theory. 

Consider the concept of the difficulty of an item. We all 
have intuitive notions as to what the psychological meaning 
of the difficulty of an item is. It means how hard it is for some 
one to pass it. But we identify the difficulty of an item with 
the percentage of people passing it. We thus have a number to 
represent the difficulty of an item which is the same number 
for all the people in the sample. Yet we know that for some 
people the item was so easy that they passed it, and for others 
it was so difficult that they failed it. It is apparent that we 
must have a definition of the difficulty of an item which will 
permit different values for different people. Of course, such a 
definition could still permit an average difficulty corresponding 
in principle to the conventional definition. 

In order to develop a psychological rationale for the difficulty 
of an item let us consider an arithmetic problem. Let this 
arithmetic problem require that an individual know how to 
perform certain operations. The problem might involve addi¬ 
tion and subtraction, the use of log tables, and a certain amount 
of reasoning. Its solution requires a collection of abilities, each 
to a certain degree and combined in a certain way. We may, 
for the sake of simplicity in discussion, lump this particular 
combination of abilities and call it a single ability. The problem 
then requires that every individual possess at least a certain 
amount of this ability in order to solve it. We shall call the 
quantity of an ability required for the solution of a problem 
the Rvalue of that problem or that item. 

Shall we regard this Rvalue of an item as its difficulty? We 



CONCEPTS OF RELIABILITY AND HOMOGENEITY 


47 


might, if we wish, so define the difficulty of the item. But this 
is not psychologically satisfying, because if we ask individuals 
how difficult an item is, some will say that it is easy and some 
will say it is difficult. How can the item have one Q value and 
yet give rise to all this disagreement about its difficulty? Ob¬ 
viously it must be because these different individuals are mak¬ 
ing their judgments from different points of view. A mathe¬ 
matics major says it is easy; a grammar school student says 
it is hard. The point of view depends on the amount of this 
particular ability the person has. Of the particular ability de¬ 
manded by the item, the amount possessed by an individual 
will be designated his C value, representing his capacity. 

We have now a hypothetical continuum on which is a ^ 
value representing the amount of an ability required by the 
item from any individual to whom it is administered, and we 
have also a C value on this same continuum for each individual 
who attempts the item. How, then, shall we represent the 
degree of difficulty that this item has for a particular individ¬ 
ual? This might be done in a number of ways. We have chosen 
to use the ratio of 41 to C to represent the psychological value 
or difficulty of this item for that individual and have called 
this ratio P, and thus we have the simple equation: 

(i) 41 = pc 

Obviously, the greater an individual’s capacity the smaller 
proportion of that capacity is required or exercised in solving 
the problem and the easier it appears to him. 

Each time ( h ) an individual (/) responds to a stimulus (/) here 
is a set of values which satisfy Qhh = Phi,C hi] . The most fre¬ 
quent objectives of psychological measurement are to determine 
something about the 41 values of each member of a set of stim¬ 
uli and the C values of each member of a group of individuals. 

But note, and this is significant to our later problem of 
metric, we do not observe 41 values and C values. Instead, what 
we observe are the P values. Thus, if an individual passes an 
item, we know that on that particular ability the individual’s 
capacity 3 , C ih was greater than the quantity 3 , 41 u> required to 
pass the item and hence the P,-, value was less than one. In 


3 The subscript h is one here. 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the method of single stimuli, which is the method most used 
in mental testing, we can divide the items into two categories 
for each individual, those whose P values were less than one 
for him, and those whose P values were greater than one*. 
From such data on several individuals we want to extract what 
information they contain about ^ and C values. If we refuse 
to make the assumptions which lead to an interval scale, ex¬ 
haustive analysis of these data would yield, at best 6 , the order 
of the stimuli, (the Q values) and the order of the people (their 
C values). 

We might digress for a moment to point out that with other 
methods of collecting data, such as the method of rank order, 
the method of paired comparisons, and the method of triads, 
we are able to collect, successively, much more information 
about the P values of stimuli for each individual and hence 
learn more about Q values and C values than we do from the 
method of single stimuli used in mental testing, Curiously 
enough it appears that we are going to be able to go further, 
with fewer assumptions, in the area of so-called qualitative 
attributes than in the area of mental testing. 

The Variance of an Individual’s Score .—Imagine now that 
we have a stimulus or test item and a group of individuals who 
respond to it. Each individual's response to the item provides 
a P value. Of course we do not know the exact magnitude of a 
P value, we know only whether it is less than one or greater 
than, one, that is, whether the individual passed or failed the 
item. But this is a limitation of this method of collecting 
data. Let us imagine that we had a method which would give 
us the exact P values. There would be, then, a distribution of 
P values for the stimulus. This distribution represents the 
distribution of difficulties which the item has for the individuals 
in the group. 

Each individual has one of the P values in this distribution. 
Let us imagine that we could again administer this item to this 
same group of individuals independently “ of its previous ad- 

*We have avoided the_ complication introduced by the true-false and multiple- 
choice type of item in which an individual may get an item right by pure chance. 
There is no need for this complication from the point of view of constructing a theory. 

6 The conditions necessary are that be constant over h and i and the Ch<i be 
constant over h and j. For purposes of future generalization these constitute an extreme 
of class i conditions (i). 

Q Experimental independence. 



CONCEPTS OF RELIABILITY AND HOMOGENEITY 


49 


ministration , Then, once again, each individual would have a 
P value for this item. Would the successive P values of an 
individual for the one stimulus be identical, even if the suc¬ 
cessive administrations were independent? This is a question 
of whether or not Pkh is constant over h for a given i and j 
and can only be answered by experiment. It might well be 
that in the case of one attribute, say arithmetic, these succes¬ 
sive P values would be almost constant for anygiven individual, 
whereas in the case of another attribute, say the aesthetic 
merit of a painting, the P values might be greatly variable. 
In this latter case we would expect the P values to be variable 
if the individual was not too clear as to just what he meant by 
aesthetic merit and hence used different criteria in successive 
evaluations of the painting. Thus, if the continuum is in¬ 
trinsically different at different times, both the Q values of 
the stimulus and the C values of the individual would be varia¬ 
ble for the same nominal trait, like aesthetic merit, because the 
exact composition of the trait was variable. 

We have conceived, now, of each individual in a group hav¬ 
ing responded a number of times to a stimulus and, hence, for 
each individual, i, there is a distribution of P hij values for the 
stimulus j. Let us now do the same thing for more stimuli, and 
imagine that there is for every individual a small distribution 
of his P values for each stimulus within the total distribution 
of all individuals’ P values for each stimulus. The notation 
used is as follows: 

h = i, i, • • • t, (the number of times an individual responds to 
a stimulus) 

i = i, 2, ■ ■ • N, (the number of individuals) 
j = I, 2 , • • • n, (the number of stimuli) 

P f = 7, 5 ? P»« 

Pi = Nt ^ ? P hH 

p = m ? ? ? Pm 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


We are now in a position to define the status score, S t (a), 
of an individual as follows: 

0 2 ) Si = nl ^ ? (Pi ~ Pw) 

or 

( 3 ) S' = P - P t 

To put the status score of an individual in words, it is de¬ 
fined as the average difficulty of all the items for all individuals 
minus the average difficulty of all the items for him alone. Thus, 
we have made the score of the individual dependent upon the 
composition of the group of individuals of which he is a member. 
On this scale the average individual has a score of zero, and 
the better the individual the higher his score, since the easier 
the items are for an individual the smaller the proportion of 
his capacity is required to pass them and the larger would be 
Si. Individuals below average would have negative status 
scores. 

Inasmuch as, in principle, an individual has a score, an Si, on 
every item every time he takes it, let us consider the composi¬ 
tion of the variance of all these “scores'’ that get averaged 
together for a total score. If we designate by Vi the total vari¬ 
ance of an individual, we have 

(4) i ? ? (Pi ~ P^) 2 - ss 

By adding and subtracting Pi,- inside the parentheses, expand¬ 
ing and collecting terms, the expression for Vi becomes: 

( 5 ) = 5 ? (r a - PiiiY + ^ 2 [Pi - Pi ,) 1 

- £- ? c PI - P'M 

Making the following definitions, 

( 6 ) D, 3 = 5 2 E (i»„ _ 

(?) w - ; 2 (P, - p„y - [! 2 c P, - p„)r 

we have 



CONCEPTS OF RELIABILITY AND HOMOGENEITY 51 

( 8 ) Vi - Dt 2 + 77 

and Fils seen to have two components. These two components, 
Di and T ; , are of psychological significance. The first com¬ 
ponent, Di, we call the individual’s dispersion score and it 
represents the variability within an individual in repeatedly 
responding (independently) to the same stimulus, summed over 
all the stimuli. Di reflects an individual’s internal consistency 
in responding repeatedly to the same stimuli. The contribution 
that is made to this component by each stimulus is essentially 
the precision of the individual’s score on each item, and when 
summed over the items is a measure of the precision of the 
individual’s total score on the test. 

The Ti component describes the variability of the individual’s 
mean position within the group as the group passes from stimu¬ 
lus to stimulus. We call this score the individual’s trait score. 

Thus, we now have two concepts to represent the hypo¬ 
thetical behavior of an individual in response to repeated inde¬ 
pendent presentations of a set of items. We have the concept 
of a dispersion score which represents the precision of an indi¬ 
vidual’s final total score on the test. And we have the concept 
of trait score which represents the stability of an individual’s 
position within the group in passing from item to item. 

Reliability and Homogeneity .—We shall now identify D,- and 
Ti with the concepts of reliability and homogeneity, respec¬ 
tively. We have here precise definitions of concepts from a 
psychological rationale such that the concepts may be manipu¬ 
lated mathematically and are susceptible to rigorous logic. 

We shall use the terms Di, dispersion score, precision, and 
reliability interchangeably; and the terms T if trait score, and 
homogeneity interchangeably. First, it is apparent from the 
mathematical definition of the concept of precision that it is a 
characteristic of an individual’s behavior on the items compris¬ 
ing the test, and does not necessarily have the same value for 
every individual who takes a particular test. To put this in the 
more common terms of test theory, the reliability of a test or, 
as we define it, the precision of an individual’s test score, may 
be different for every individual who takes the test. It is an 
approximation of unknown degree to assign the same coefficient 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


to all individuals. This approximation, perhaps, would be 
reasonably close in the case of some mental tests, but in others 
the individual differences in D< might be considerable. 

The relation between reliability and homogeneity is an inter¬ 
esting one. In principle we could construct a test which would 
have high precision, or reliability, and such that the items would 
have zero intercorrelations, or, for that matter, any values 
from plus one to minus one. Thus, if a man’s score on one item 
was the number of children he has and on another item his 
cephalic index, and on a third item the number of clubs and 
societies he belongs to, his total score would have very high 
reliability. It does not necessarily follow, however, that the 
score means anything—that it represents a point on a con¬ 
tinuum which is a psychological trait continuum. Obviously, 
then, the fact that one has high precision for a test score has 
no bearing on whether or not one is measuring some kind of 
meaningful psychological entity. If one takes a number of 
things which are qualitatively different and adds up the scores 
on these different things for each individual, then the total 
scores will be a set of numbers which may have the property 
of precision but will have no common quality. 

Let us turn now to the trait score which we identify with 
homogeneity. This denotes the stability of an individual’s posi¬ 
tion within a group. Such a measure would not be an exclusive 
property of an individual, as in the case of precision, but is a 
property of the group as a whole on the test, and hence 
should be averaged over the individuals, 

The significance of this concept lies in its indicating the 
degree to which the final total scores of individuals have some 
common quality or represent a psychological entity for the 
group. The expression for the trait score, T { , averaged over 
individuals, is essentially equivalent to the notion of correlation 
between items, except that it is expressed in terms of variance 
rather than correlation or covariance. 7 

Thus, if we have a test consisting of a number of items, each 

’Another way oflooking at D 1 i and T 2 ; is by analogy with error variance and true 
variance in conventional test theory. The analogy between Z) ! i and error variance is 
justified. But T 2 j is a variance generated by lack of homogeneity among the items. 
Hence, in the sense used here, the “true variance” would represent the degree to 
which the items failed to constitute an organized and integrated common trait. 



CONCEPTS OF RELIABILITY AND HOMOGENEITY 


53 


from a different primary mental ability, we would expect the 
position of the individual within the group from item to item 
to be variable. This is on the premise that there are intra¬ 
individual differences in ability. On the other hand, if the test 
were a set of arithmetic items then the position of the individual 
within the group as it passed from item to item would probably 
be relatively stable and there would be a high degree of homo¬ 
geneity. These two tests might well have equally high reliability 
but quite different homogeneities. 

In principle, the two components D< and T< are independent 
and it is not difficult to imagine a test with perfect precision 
for all individuals, or perfect reliability, and with a degree of 
homogeneity anywhere from zero to perfect. On the other hand, 
in a probability sense, it would perhaps be much more difficult 
to construct a test with perfect homogeneity but with low pre¬ 
cision. Such a relation is implicit in the reasoning behind the 
attempt to increase the reliability of a test by means of an item 
analysis against an internal criterion. 

Indices .—We have reached a point now where we must con¬ 
sider again the distinction between the defined meaning of a 
concept and the index which presumably is a measure of the 
concept. What we have tried to do is to provide meaningful 
definitions of the concepts of precision and homogeneity but we 
have not provided an index for either one of these concepts. An 
index is simply a method of analyzing data to get certain infor¬ 
mation. Hence, in order to compute a meaningful index, the 
data must contain this information. Consider, for example, 
what is required of the data so that they will contain informa¬ 
tion about the precision of an individual’s score. We can see 
that to get a measure of precision, that is, to compute an indi¬ 
vidual’s dispersion score, requires repeated independent re¬ 
sponses from him to the same item. The method of single 
stimuli conventionally used in mental testing does not provide 
such observations. Thus, it appears that with conventional 
testing methods an index of the reliability of a test score is 
indeterminate and there is no valid formula for reliability. On 
the other hand, the T, component of an individual’s total 
variance requires only one observation per individual per stimu¬ 
lus and, hence, data collected by the method of single stimuli 



54 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

do contain information pertaining to the concept of homo¬ 
geneity. But samples of size one are poor estimates of the mean 
of a distribution. Nevertheless, they can be used to get an esti¬ 
mate of the variance between distributions which is, however, 
contaminated by the variance within the distributions. The two 
components, D, and of the total variance cannot be sepa¬ 
rated in data collected by the method of single stimuli. In other 
areas, a method for collecting data like the method of paired 
comparisons or the method of triads does provide information 
pertaining to both components and it is possible in principle to 
measure them both. 

Essentially, what we have done is to give the quantitative 
definition of concepts based on a psychological rationale prece¬ 
dence over the statistical procedure of computing an index and 
then arguing about what the index means. We have chosen to 
have meaningful concepts and to recognize that our measures 
of them are inadequate and approximate rather than to take 
the measures as experimental facts and try to give them psycho¬ 
logical meaning with consequent ambiguity and controversy. 

What is it, then, that we do get from our indices of reliability 
or homogeneity? It is apparent that we can have no clear index 
of either the precision of a test score or the homogeneity of a 
test from conventional testing methods. Every index designed 
to represent one or the other actually represents a joint effect. 
The various indices merely differ in the nature of their approxi¬ 
mation, then, to Vi, the left hand side of equation (8), summed 
over all individuals. 

Inasmuch as this V f is also the variance of an individual’s 
score just as one of its components, is, one might ask what 
the difference is between them. The difference is that Di, the 
variability within an individual, is the degree of precision of a 
score on the test. Vi , the left hand side of the equation, is the 
precision of the individual’s score on the attribute, the domain 
which the sample of items represents, Obviously, the homo¬ 
geneity of the items in a test has nothing to do with the preci¬ 
sion of a score on the test. But, obviously, this same score, 
when regarded as an estimate of the individual’s score on the 
domain or attribute of which the items constitute a sample, is 
dependent upon the homogeneity of the domain. The greater 



CONCEPTS OF RELIABILITY AND HOMOGENEITY 


55 


the homogeneity of the domain, the more alike will be the 
scores of an individual on successive samples of items from that 
domain. 

XV. Next Steps 

As we see some of the implications of this for the further 
development of test theory, there appear to be three general 
alternatives, the first of which has two sub-alternatives: 

i. Continue with the method of single stimuli as a method 
of collecting data. Then we can do one of two things: (a) make 
the necessary assumptions to achieve an interval scale and 
hence have numbers to manipulate, 8 or (b) drop the assump¬ 
tions which lead to an interval scale and substitute Lazarsfeld's 
latent structure analysis (4). The first sub-alternative above is 
to continue in the conventional manner. This will permit easily 
accomplished empirical studies in which we could rarely have 
firm confidence and unambiguous interpretation. The second 
sub-alternative requires going in an entirely new direction. Laz- 
arsfeld’s latent structure analysis is a non-metric theory for 
the scaling of data collected by the method of single stimuli. 
Obviously, his theory could be taken over bodily by test 
theorists, although from a practical point of view there are 
still computational hurdles. Such difficulties, however, are mere 
mechanical limitations and are not defects of the theory. 

1. A second general alternative is to discover or to develop 
a new method for collecting data which would enable us to put 
the items in rank order for each individual as to how well he 
passed them and how badly he failed them. If we could collect 
such data we would then have data which, with very simple 
assumptions, contain information about metric relations be¬ 
tween stimuli and individuals (1). 

3. A third alternative is to discover or to develop a new 
method for collecting data which would be equivalent to the 
method of paired comparisons, This would require repeated 
independent responses to each stimulus. Such data would con¬ 
tain information on the metric relations between stimuli and 
individuals, and, in addition, information on the two compo- 

®A better sub-alternative here is to experimentally validate the assumptions of 
an interval scale if this is possible. 



$6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


nentsof precision and homogeneity, making a precise distinction 
between them possible, 


V, Summary 

We have tried to show that the assumptions required for an 
interval scale and the identification of indices with concepts 
are serious obstacles to the further development of test theory, 
We have then developed a rational basis for defining the diffi¬ 
culty of a test item for an individual and, from this basis, 
developed mathematical expressions for the concepts of relia¬ 
bility and homogeneity, It was then made apparent that the 
measurement of reliability and homogeneity from the analysis 
of data collected by the method of single stimuli is not possible, 
as such data do not contain the necessary information. Several 
alternative directions for the further development of test theory 
are pointed out. 

REFERENCES 

1. Coombs, C. H. “Psychological Scaling Without a Unit of Measure¬ 

ment." Psychological Review, (in press), 

2. Coombs, C. H. “Some Hypotheses for the Analysis of Qualitative 

Variables." Psychological Review, LV (1948), 167-74. 

3. Stevens, S. S. “On the Theory of Scales of Measurement," Science, 

CIII (1946), 677-80, 

4. Stouffer,_ S. A, et al. Measurement and Prediction. Princeton: 

Princeton University Press, 1949, 

5. Thomas, L. G. “Mental Tests as Instruments of Science," Psy¬ 

chological Monographs , LIY (1942), No, 3. 

6. Thorndike, R, L. “Logical Dilemmas in the Estimation of Relia¬ 

bility." National Projects in Educational Measurement. Series 
I. Reports of Committees and Conferences. XI (1947), 21- 

4 °. 



PROBLEMS IN MEASURING THE EFFECTIVENESS 
OF PROFESSIONAL EDUCATION 


DONALD K. BECKLEY 
Simmons College 

In the process of completing a recent study of the effective¬ 
ness of one area of professional education, a number of prob¬ 
lems arose that may well be of interest to others planning in¬ 
vestigations of a similar nature. For this reason, this article 
has been prepared to describe some of these problems and the 
methods by which they were met. The study concerned was 
made to ascertain the effectiveness of college training for ex¬ 
ecutives in retailing in terms of selected objectives determined 
to be desirable. To do this, the performance of retailing gradu¬ 
ates in respect to these objectives was measured by means of 
an achievement examination and compared with the perform¬ 
ance of other groups. 

Selecting Groups for Comparison 

A question arose at this point of what groups to use for pur¬ 
poses of comparison. It was recognized that, as in other areas 
of professional and vocational education, objectives thought 
to be desirable might very possibly be attained by means of 
work experience as well as through formal college training. A 
study of this nature could be helpful in identifying those ob¬ 
jectives that could best be taught by means of formal college 
training and those for which, work experience itself was best 
suited. 

In appraising the effectiveness of formal training, it was thus 
necessary to take into consideration both formal training and 
work experience as factors to be measured in respect to achieve¬ 
ment of the selected objectives. In order to have these two fac¬ 
tors appear in all possible combinations, it was necessary to 
find subjects in each of these four groups: (i) no training, no 
work experience, (a) training, no work experience, (3) work 

57 



58 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

experience, no training, and (4) both training and work ex¬ 
perience. Subject groups who met these requirements were 
obtained through the use of these categories, in which the dis¬ 
tinguishing characteristics are the presence or absence of the 
two factors: 

1. Incoming students at the Simmons College Prince School 
of Retailing who have neither studied retailing in formal 
courses nor had extensive work experience. 

2. Students who have completed the course in retailing at 
the Prince School of Retailing, but have not yet had ex¬ 
tensive work experience. 

3. Employees in Boston stores who are in positions of the 
kind graduates soon will be taking, but who have had no 
formal retail training. 

4. Store executives and junior executives who have had a 
specified amount of store experience and also are gradu¬ 
ates of the Prince School of Retailing. 

Because two programs of retail training are offered at Sim¬ 
mons College where this study was made, it seemed appro¬ 
priate also to consider educational level as another factor. 
Hence, within each of the four groups were two sub-groups, 
one consisting of students who had completed a four-year 
undergraduate college liberal arts program, and the other in¬ 
cluding those who had spent only two years in liberal arts 
study before beginning their retail training. 

The purpose of the study, then, was to determine the strength 
of these three factors: (1) formal retail training, (a) retail- 
work experience, and (3) under-graduate-college education in 
respect to achievement of selected retailing objectives. The 
hypothesis to he applied was that groups initially comparable 
in all respects but differing in their treatment should reflect 
differences in achievement that are the result of that particular 
treatment. 

The nature of the experiment can best be indicated by ar¬ 
ranging the data in the following design: 

No experience 

No training Training 

2 yrs. coll, 4 yra, coll. 2 yra. coll, 4yr3, call. 

N = 36 N = 30 N = 29 N = 28 

Experience 

N = 0.9 N = 32 N = 12 iV = 10 



measuring effectiveness of professional education 59 


The basic test-score data are presented in Tables i and 2 , 
in which the letters refer as follows: (a) no training, no work 
experience, (b) training, no work experience, (c) work experi¬ 
ence, no training, and (d) both training and work experience. 
Numeral i refers to students with four years of college prepara¬ 
tion, and numeral 2 refers to students with two years of college. 

It was recognized that any statistical design selected could 
not be adequately precise when uncontrolled variables still 
remained. In this study, intelligence of the subject was meas- 


TABLE i 

Means of Scores on Retailing Examination 


Group 

If 

Total 

1 

Test Scores 

II III 

IV 

V 

a -1 

30 

4 I -+7 

9-13 

8.07 

9.07 

8.77 

3-80 

b-i 

28 

59 -57 

12.72 

n.54 

H -39 

13-50 

7.41 

c-r 

32 

50.01 

11.75 

9-34 

11.41 

10.41 

7.91 

d-i 

IO 

60. IQ 

12.70 

11.30 

14.60 

12.70 

8.80 

a-2 

36 

36.+I 

10.01 

7.86 

7.67 

8-39 

5.06 

b-2 

29 

56.17 

13-13 

9-54 

14.17 

U -34 

7.82 

c-2 

29 

44-31 

ir.oq 

7.58 

ro. 14 

8.62 

6.38 

d-2 

11 

50.25 

11.66 

8-33 

12.33 

H -33 

5-58 




TABLE 2 





Standard Deviations of Scores 

on Retailing Examination 







Test .Scores 



Group 

N 


1 

II 

III 

IV 

V 

a-1 

30 

2 

.11 

2.02 

2.02 

3.07 

2.21 

b-i 

28 

I, 

■43 

1.68 

1.97 

2.23 

1,86 

C-I 

32 

2 

■09 

1-73 

2.81 

3-09 

2.9 s 

d-i 

IO 

I .62 

2,05 

1 . 16 

2.61 

1.54 

a -2 

36 

1.62 

2,46 

2.26 

2.24 

2,09 

b-2 

29 

I 

.18 

2.20 

1.69 

2.19 

i -53 

C -2 

29 

I 

• 7 i 

2.40 

2.90 

1.47 

a.59 

d-2 

11 

2 

.09 

4-13 

1.95 

M 7 

2.69 


ured by the Wonderlic Personnel Test, and sex differences were 
eliminated by having only women as the subjects. Recognizing 
that the age levels of the two sub-groups differ by several 
years by definition, calculation of critical ratios indicated that 
none of the differences between the means of the various groups 
were significant, thus minimizing age as a factor here. 

Selecting Subjects for Administration 

Some difficulties were encountered in obtaining an adequate 
sample of subjects in all of the groups. Categories I and 2 





6o EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


consisted of incoming and outgoing students, hence they were 
readily available to take the retailing examination. Through 
the cooperation of a large Boston department store, a com¬ 
parable number of subjects in group 3 was made available. 
In the case of group 4, with both formal training and work 
experience, obtaining subjects was more difficult. In order to 
have work experience comparable in amount and degree to 
that of subjects in group 3, it was necessary to select graduates 
of the School who had been working for approximately one to 
two years. Because of the small number of graduates with 
this amount of experience, the total number of possible subjects 
was definitely limited. A practical difficulty faced here was that 
most of the 34 eligible subjects lived away from Boston, and, 
in fact, covered most parts of the United States. It was not prac¬ 
ticable to talk with them in person, or to administer the ex¬ 
amination personally, as was done with the other groups, and 
the only feasible method of reaching them was by mail. A 
letter was sent to each of these people requesting her assist¬ 
ance and enclosing the examination materials together with 
detailed directions as to the procedures to be followed. A follow¬ 
up card was sent to those who did not return the completed 
materials by the date suggested, and the final return consisted 
of 22 cases. 

Because of the nature of the questions asked in the retailing 
examination, it seemed unlikely that more than a few of the 84 
objective questions could be answered readily through the use 
of notes or texts. In view of the explanation that the average 
scores of each group rather than individual scores were of in¬ 
terest in the investigation, it further seemed unlikely that any 
of the subjects would have sought to use outside help in an¬ 
swering the questions. In the case of the Wonderlic Personnel 
Test there was the question of whether or not the subjects 
had adhered to the specified time limit. Each score was checked 
in terms of the subjects’ previous academic performance as a 
student, and any earlier intelligence scores available. In the 
case of two subjects whose earlier record did not seem to justify 
the very high intelligence scores received, deductions were 
made arbitrarily to make their scores approximate the mean 
of the group excluding these two scores, where they would not 



measuring effectiveness of professional education Si 

influence the group computations. In all other cases, the scores 
received appeared by inspection of the records available to be 
entirely probable, and hence were accepted as having been 
done under the conditions specified. 

Checking the Reliability of the Examination 

When the examination in retailing had been constructed, 
the question arose as to what measure to use in determining its 
reliability. This question might better have been considered 
before rather than after the examination was made. Because 
of the conditions under which this examination in retailing wa9 
developed and was to be administered, it was not feasible to 
measure reliability through the use either of a retest or of equiv¬ 
alent forms. Thus, it appeared that some use of the split-half 
technique or application of the Kuder-Richardson formulae 
was appropriate here. Originally the split-half technique was 
rejected because the examination had not been properly 
planned for the measurement of reliability, and there would 
have been an item discarded from each of several sub-groups 
when the odd ancl even items were matched, The Kuder- 
Richardson formula number 10, which gives an estimate of the 
reliability of a test when the numbers of items, the standard 
deviation, and the average variation of the items are known, 1 
has been described as superior to coefficients obtained by the 
split-half method, because any error due to bias in splitting a 
test is eliminated. 2 

Because the examination in retailing was divided into five 
sets of items representing the five objectives being measured, 
it was desirable to estimate reliability coefficients for each 
objective separately. Similarly, the four groups to whom the 
examination was administered were different, and also were 
treated separately. Except for group a, students who were 
tested at the time they were finishing their course in retailing 
and thus a highly homogeneous group in respect to test per¬ 
formance, all groups had reliability coefficients ranging be- 

n ,1 See ICuder, G, F. and Richardson, M, W. “The Theory of the Estimation of Test 
K-eiiability/' Psychometrika , II (19.37), page 15B. 

2 See Jackson, R. W, B. and Ferguson, G. A. Studies on the Reliability of Tests, 
Bulletin No. ii i Dept, of Educational Research, Toronto: University of Toronto. 



62 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

tween .514 and .900. When reliability was calculated by the 
split-half method in spite of the objection mentioned earlier, 
the coefficients for group 2 were shown to be higher than 
originally calculated, and within the range indicated for the 
other groups, thus leading to the conclusion that the examina¬ 
tion was adequately reliable for group use. 

Planning an Experimental Design 

Perhaps the most important problem in undertaking a statis¬ 
tical study is the selection of an experimental design with a suf¬ 
ficiently high degree of precision to answer the questions de¬ 
sired. The problem here was to select a design to indicate 
whether or not differences in gains among groups of students 
were greater than would be expected from the operation of 
chance factors alone. 

A technique often used in investigations such as this is the 
matching of pairs, It would have been possible to match pairs of 
cases within each pair of groups in this experiment, but the un¬ 
equal number of cases would have proved to be a disadvantage 
in that many cases in the larger groups would be left over after 
pairs were matched. 

One technique appropriate for use in this type of experiment 
is the analysis of variance. As described by Lindquist, 3 the 
variance of a sample can be analyzed into two components: the 
within-groups variance and the between-groups variance. If 
the hypothesis of random sampling is correct, the two estimates 
of variance would normally differ only by chance. The F test, 
known also as the variance ratio, indicates at the desired level 
of significance whether or not the estimated variances are larger 
than chance. If so, there is reason to believe the hypothesis to 
be false. 

In this experiment, however, it seemed especially desirable 
to ascertain the strength of the relationship among the factors. 
This measurement was not available through the use of analysis 
of variance, and the Peters’ regression technique was used. The 
covariance technique could be used to account for the initial 
lack of equivalence of groups and also in estimating the relia- 

3 Lindquist, E. F. Statistical Analysis in Educational Research. Boston: Houghton 
Mifflin Company, 1941. Page 76, 



measuring effectiveness of professional education 63 

bility 0/ differences between the adjusted final means. The 
Peters’ technique seemed preferable, however, because it pro¬ 
vided an index of the strength of the relationship comparable to 
a coefficient of correlation. This involved the matching of the 
experimental and control groups through use of a regression 
technique which does not require pair-by-pair matching. This 
treatment made it possible to know whether the three experi¬ 
mental groups did better on the achievement examination in re¬ 
tailing than would be expected in view of their intelligence test 
scores. The hypothesis tested here was that there were no real 
differences produced by the factors introduced, and that any 
differences in final mean scores, after allowances had been made 
for chance differences in initial mean scores, were due entirely 
to chance fluctuations in random sampling.' 1 
This technique has been described by Peters 6 as follows: 

The method involves setting up a regression equation in rec¬ 
tilinear form based on the statistics of the control group, then 
predicting by it what should be the achievement scores of the 
members of the experimental group if they were just like the 
control group members; if, that is, the experimental factor pro¬ 
duced no differential effect. We can, then, determine the dif¬ 
ferential effect for the experimental factor by the extent to 
which the average achievement of the experimental group ex¬ 
ceeded or fell short of that predicted for it by the regression 
equation. 

While similar in many respects to Fisher’s covariance tech¬ 
nique, the Peters’ technique makes the regression equation from 
the statistics of the control group rather than from the experi¬ 
mental and control groups pooled, on the ground that a pooled 
estimate would be a meaningless hybrid if the two groups dif¬ 
fered by reason of the experimental factor, as probably would 
be the case. 0 

The use of the regression technique is especially appropriate 
here, since it is recognized that there is a positive correlation be¬ 
tween academic aptitude or intelligence, particularly verbal 
ability, and scores on the retailing examination. In this experi- 

i Ibid., p. 181. 

s Peters, C. C. "A Method of Matching Groups for Experiment with no Loss of 
Population ." Journal 0f Educational Research, XXXIV (1940), 70-74, 

“Peters, C, C, it al. “Research Methods and Designs.” Rcoievi of Educational Re¬ 
search, XV (1945), 377-393. 



64 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

ment, a regression equation was calculated from the scores ob¬ 
tained on an intelligence test and the retailing examination by 
the control group with neither retail training nor work experi¬ 
ence. This equation was then used to predict the retailing exami¬ 
nation score from the intelligence-test score for those in each of 
the three other groups, which were regarded as experimental 
groups. This predicted score was then compared with the actual 
score of each case in the experimental groups, and the signifi¬ 
cance of the difference between means of predicted and actual 
scores was tested for each objective in each sub-group separately. 

A major problem in this connection concerned the standard 
error formula appropriate for use here. In this situation the 
different numbers of cases in the control and experimental 
groups do not affect the standard error formula, but the dif¬ 
ference in the means of the matching scores of the control and 
experimental groups requires an adjustment for that difference, 
Thus, instead of the conventional formula for calculating the 
standard error of the difference between means, a special 
formula as stated by Peters and Van Voorhis 7 must be used 
because the groups are not perfectly equated on the basis of 
the matching factors. The differences between the means at¬ 
tained in the various tests were then divided by the standard 
errors of the differences in order to determine the t-ratios. 

The Peters' regression technique, described above, served to 
indicate clearly the level of significance of the mean differences 
in achievement scores when the groups were equated for in¬ 
telligence, but they did not identify the relative strength of the 
factors being measured. The problem thus arose of how to 
measure the magnitude of the relationship between achievement 
in retailing and the several factors to be isolated: retail training, 
work experience, and college education. Some measure of corre¬ 
lation was needed here to indicate the strength of relationship 
between achievement and each of these factors with the other 
factors held constant. 

The Kelley correlation ratio, £, was found to be an appro¬ 
priate statistical treatment for this purpose, particularly be¬ 
cause it is, not affected by disproportionate numbers of cases 


7 Peters, C. C. and Van Voorhis, W. R. Statistical Procedures and Their Mathe¬ 
matical Bases. New York: McGraw-Hill Book Company, 1940, 



MEASURING EFFECTIVENESS OF PROFESSIONAL EDUCATION 65 


in the various groups. As described by Peters and Van Voorhis, 8 
when corrected, £ has a standard meaning free from bias and 
independent of the size of the population of the sample and of 
the number of classes into which the sample is divided. It has 
been shown to have all the merits of analysis of variance, and, 
in addition, is interpreted positively rather than negatively, as 
in the case of the t- and .F-scores involving the null hypothesis. 

A problem, however, was how to set up the data in this study 
to make possible meaningful analysis. One plan used was to set 
up direct comparisons of various pairs of subject groups in order 
to isolate each of the three factors to be measured. For example, 
to isolate the factor of formal training, group i (no work, no 
training) was compared with group i (no work, training); and 
group 3 (work, no training) was compared with group 4 (work, 
training). By this kind of classification, direct comparisons were 
made between various pairs of groups, thus holding constant 
the factor present in or absent from both groups. 

Although useful to some extent in measuring strength of re¬ 
lationship of the various factors, the e treatment described 
above was not entirely satisfactory, and some further classifica¬ 
tion was sought whereby two of the three factors to be measured 
could be isolated simultaneously while the strength of the third 
factor was being measured. As described by Peters and Van 
Voorhis, 0 there is a technique through which subjects can be 
sorted into classes on the basis of some known factor, and then 
subsorted into sub-classes. The variance of these sub-classes 
will be due to factors other than those which determine the class 
sorting. This treatment, which, in effect, is partial e, was used 
in the study being described. Because two factors were to be 
held constant, it was necessary to sub-subsort the data. For 
example, to find the partial £ for education on achievement, 
with training and work experience held constant, the following 
classification was made; 


Work Experience 


Training 


2.yrs.c. 


4yrs. c. 


No Training 


ayrs.c, 


kyrs.c. 


No Work Experience 


Training 

2yrs.c.| 4yrs. c. 


No training 


ayrs.c. 


4yrs.c. 



66 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Through the calculation of corrected e, it was possible to com¬ 
pare directly the strength of the three factors, and thus to have 
some statistical basis for noting the relative importance of these 
factors in respect to each of the objectives being measured. 

The conclusions reached from this study were as follows: 

1. The theory of retail training proposed that work experi¬ 
ence alone can be more effective than formal training alone in 
teaching specific job techniques was not substantiated in respect 
to the objective: cultivation of skills in the use of retailing math¬ 
ematics. Formal training alone was found to be approximately 
equal in effectiveness to work experience alone in this area. 

2. The presumption that the combination of formal training 
and work experience together would prove more effective than 
either training or work experience alone was not consistently 
borne out, possibly because of limitations in the size of the 
sample studied. Although subjects in this group performed sig¬ 
nificantly better than the control group in the case of all but 
two sub-groups, these subjects did not consistently show sig¬ 
nificantly greater differences as compared with subjects with 
training or work experience alone. Many of the subjects tested 
had been working since graduation in personnel positions which 
did not directly involve customer contact or the use of mer¬ 
chandising mathematics, and the data suggest that as with 
training in other fields, people remember best those kinds of 
learning with which they are most directly interested or em¬ 
ployed. 

3. Of the five objectives measured, work experience was 
shown to be relatively the most effective in: (1) skill in the use 
of retailing mathematics, and (2) identification of retailing 
facts. Work experience was least effective in teaching the com¬ 
prehension of the nature of distribution. As indicated above, 
work experience equalled formal training in effectiveness only 
in respect to skill in the use of retailing mathematics. 

4. Subjects with four years of liberal-arts-college education 
were better prepared to be effective retail executives than those 
subjects with two years of liberal-arts-college work, except in 
the case of the objective: application of principles of retail 
management, where no significant relationship exists. 



the concept of validity in the interpre¬ 
tation OF TEST scores 


ANNE ANASTASI 
Fordham University 

If asked to define "validity,” most psychologists would prob¬ 
ably agree that validity is the closeness of agreement of a test 
with some independently observed criterion of the behavior 
under consideration. It is only as a measure of a specifically 
defined criterion that a test can be objectively validated at 
all. For example, unless we define "intelligence” as that com¬ 
bination of aptitudes required for successful school achieve¬ 
ment, or for survival on a certain type of job, or in terms of 
some other observable criterion, we can never either prove 
or disprove that a particular test is a valid measure of "intelli¬ 
gence.” The criterion may be expressed in very broad and 
general terms, such as "those behavior characteristics in which 
older children in our culture differ from younger children reared 
in the same culture,” but, however expressed, it defines the 
functions measured by the particular test. To claim that a 
test measures anything over and above its criterion is pure 
speculation of the type that is not amenable to verification 
and hence falls outside the realm of experimental science. 

To the question, "What does this test measure?”, the only 
defensible answer can thus be that it measures a sample of 
behavior which in turn may be diagnostic of the criterion 
or criteria against which the particular test was validated. 
Nor is there any circularity implicit in such a definition of 
validity, since a psychological test is a device for determining 
within a relatively short period of time what could otherwise 
be discovered only by means of a prolonged follow-up. For 
example, with a psychological test we may be able to predict 
within a certain margin of error which applicants will succeed 
on a given job or which students will be able to complete 
a medical course satisfactorily, Logically, the same information 

67 



68 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

could have been obtained, even more precisely, by hiring all 
job applicants or admitting to medical school all students 
wishing to enroll, and observing the subsequent performance 
of each subject. The latter procedure is obviously so time- 
consuming and wasteful, however, as to be completely im¬ 
practicable. Hence the tests make a real contribution in per¬ 
mitting predictions in advance of lengthy observations. Another 
advantage of standardized psychological tests is that they make 
possible a comparison of the individual’s performance with 
that of other persons who have been observed in the same 
sample situation represented by each test. In other words, 
the tests provide norms for evaluating individual performance. 

Prediction and comparison with norms represent valuable 
contributions which psychological tests can render to our knowl¬ 
edge of individual behavior, the practical benefits of these 
contributions having been widely demonstrated. It is of funda¬ 
mental importance, however, to bear in mind that psychological 
tests do not provide a different kind of information from that 
obtained by any other observation of behavior. The use of 
such labels as “intelligence,’’ “aptitude,” “capacity,” and “po¬ 
tentiality” has probably done much to make test users lose 
sight of the empirical validation of tests. A number of current 
disagreements regarding the interpretation of test results and 
the susceptibility of tested abilities to training may be trace¬ 
able to a failure to take due cognizance of validation procedures. 
Many test users apparently give only preliminary and possibly 
perfunctory attention to validation data, in order to reassure 
themselves at the outset that the test is “satisfactory.” Their 
interpretation of the scores obtained with such a test, however, 
often takes no account of the validation data and is expressed 
in terms which bear little or no relation to the criterion. 

Perhaps one of the most common examples of such an in¬ 
consistent treatment of test validity is provided by what we 
may call the argument of “extenuating circumstances.” Let 
us suppose that a child obtains an IQ of 58 on a verbal intelli¬ 
gence test, and that the examiner subsequently finds evidence 
of a fairly severe language handicap in this child owing to 
foreign parentage. It is a common practice to conclude in 
such a case that the obtained IQ is not “valid,” on the grounds 



THE CONCEPT OF VALIDITY 


69 


that the verbal content of the test rendered it unsuitable for 
testing such an individual. At this point we may inquire, 
however, “On the basis of what criterion is this IQ invalid?” 
Certainly the obtained IQ may be a valid measure of the 
behavior defined by the criterion against which the particular 
test was validated. It is very likely that the same language 
handicap which interfered with performance on this test will 
interfere with the child’s behavior in other linguistic situations 
of which this test is an adequate index. The correspondence 
with the criterion may thus be just as close for this child as 
for children without a language handicap. In school, for ex¬ 
ample, the language handicap would probably interfere with 
the child’s acquisition of important skills and information. 
The resulting academic backwardness, together with the origi¬ 
nal language handicap itself, would, in turn, affect certain 
aspects of job performance and other areas of adult activities. 
Conversely, any remedial efforts designed to eliminate the 
language handicap would produce an improvement, not only 
in the tested IQ, but also in the broader area of behavior 
of which this test is a predictor. 

It should be added parenthetically that language handicap 
has been chosen as an example only for purposes of discussion. 
A number of other “extenuating circumstances,” such as visual 
or auditory defects, emotional and motivational factors, in¬ 
adequate schooling, and the like, could have served equally 
well to illustrate the point. Similarly, the discussion has been 
limited to intelligence tests, since it is chiefly in connection 
with these tests that many confusions regarding validity have 
arisen. The entire discussion applies equally well, however, 
to all types of psychological tests, 

Specifically, how does the case cited in our illustration, as 
well as others of its type, differ from those in which no question 
is raised regarding the “validity” of the test or its applicability 
to the particular individual? First, in the present case the 
examiner has direct and certain knowledge regarding at least 
one of the factors which determine the subject’s subnormal 
performance, viz., language handicap. In other cases, the prin¬ 
cipal determining factor might be inferior schooling facilities, 
parental illiteracy, cerebral birth injuries, a defective thyroid, 



70 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


or any of a large number of psychological or biological condi¬ 
tions. Yet it is doubtful whether the IQ would be considered 
'‘invalid’' in all of these cases simply because it proved possible 
to point to a specific condition as the determining factor in 
the poor test performance. To be sure, in many cases of low 
IQ, the examiner has little or no knowledge about the cir¬ 
cumstances or conditions which lead to the intellectual back¬ 
wardness. But such ignorance is obviously no more conducive 
to “valid” testing. Quite apart from the question of validity, 
the examiner should, of course, make every effort to under¬ 
stand why the individual performs as he does on a test. The 
fullest possible knowledge of the individual’s pre- and post¬ 
natal environment, structural deficiencies, and any other rele¬ 
vant conditions in his reactional biography is desirable for the 
most effective use of the test data. But to explain why an in¬ 
dividual scores poorly on a test does not "explain away” the 
score. There are always reasons to account for an individual’s 
performance on a test. Language handicap is just as real as 
any other reason. 

A second distinguishing feature of our example is that such 
a language handicap is usually remediable. The individual need 
not be permanently backward in intellectual performance, but 
with special training he may in large measure compensate 
for past losses in intellectual progress. Susceptibility to treat¬ 
ment is, however, a matter of degree. Many of the conditions 
determining intellectual performance, whether structural or 
functional, are amenable to change under special treatment. 
Moreover, conditions for which no effective therapy is now 
known may yield to newly developed treatments in the future. 
The distinction in terms of remediability is thus rather tenuous. 
Nor does such a. distinction have any direct bearing upon the 
validity of a measuring instrument, A thermometer may be 
a valid index of fever, despite the fact that the administration 
of medicine will cure the fever. 

Thirdly, some may point out that language handicap is 
not hereditary and may maintain that for this reason its influ- 
enc upon test performance ought to be “ruled out.” Such 
an objection contains a tacit assumption that psychological 
tests are primarily concerned with those individual differences 



THE CONCEPT OF VALIDITY 


7 * 


in behavior which can be attributed to heredity. Since the 
number of hereditary conditions which have been clearly re¬ 
lated to behavior differences are extremely few, such a policy, 
if followed consistently, would mean the virtual cessation of 
psychological testing. Moreover, the connection between hered¬ 
itary mechanisms and behavior is so remote and indirect as 
to render the distinction between hereditary and environmental 
factors in behavior largely an academic one (cf., e.g., 1). Above 
all, it should be noted that no criterion against which any 
psychological test has been validated is itself traceable to purely 
hereditary factors. Hence no such test has been proved to be 
a valid measure of individual differences in hereditary charac¬ 
teristics. 

A fourth point to be considered is that of comparability. 
It may be objected that the individual who is handicapped 
by language difficulties, sensory deficiencies, or similar “ ex¬ 
tenuating circumstances” is not comparable to the validation 
group on which the test norms were established. The require¬ 
ment of comparability in the application of psychological tests 
needs further clarification. If individuals are entirely similar 
in all of the conditions (psychological, physiological, etc.) which 
influence the behavior measured by a particular test, individual 
differences will disappear, all subjects receiving the same score. 
Obviously no test is designed to measure behavior independ¬ 
ently of the conditions which determine such behavior—that 
would be a logical absurdity as well as an empirical impossi¬ 
bility. When the conditions in which the individual differs 
from the standardization group affect the test and the criterion 
in an approximately equal manner and degree, die validity of 
the test for that individual will not be appreciably influenced 
by the lack of comparability of the individual to the standard¬ 
ization group. 

This question of “comparability” pertains not so much to 
the measurement of behavior as to the analysis of the etiology 
of behavior differences. It is only when attributing the observed 
individual differences in test scores to a particular factor or 
class of factors that the investigator must make certain that 
other contributing factors have been reasonably constant. For 
example, if a few individuals in a group have a language 



7 1 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


handicap while the rest do not, we could not ascribe individual 
differences in performance within this group to structural dif¬ 
ferences in the nervous system, or to any other factor whose 
contribution to behavior we may be investigating. The same 
limitation would apply, however, if educational opportunities, 
family traditions, incentives for intellectual activities, or any 
other factor were not held constant. The fact that the influence 
of language handicap, sensory deficiencies, and a few other 
conditions is more readily apparent does not place such con¬ 
ditions in a different category. The question of comparability 
applies equally to all conditions other than the one under 
investigation. 

A fifth consideration pertains to the use of test scores in 
prediction. Could an IQ obtained by a child with a language 
handicap serve as a basis for predicting the subsequent be¬ 
havior of the individual? As long as the language handicap 
remains, the test score can provide an accurate prognosis of 
the child’s behavior in situations demanding the type of verbal 
responses sampled by the test. It is only in this sense that any 
psychological test makes predictions possible. Within a certain 
margin of error, behavior can be predicted under existing con¬ 
ditions. But if, for example, any detrimental conditions such 
as poor schooling, sensory deficiencies, or the like are corrected, 
then performance on both test and criterion will show improve¬ 
ment. In discussions of test reliability, various writers during 
the past twenty-five years have pointed out that a psychological 
test should be expected to reflect changes in behavior at differ¬ 
ent times and under different conditions. 1 For test scores to 
remain constant when conditions affecting the subject’s be¬ 
havior have altered would indicate a crude and relatively in¬ 
sensitive measuring instrument, rather than a highly “reliable” 
one. The same logic applies to validity. If the subjects’ test 
scores remain unchanged despite the modification of conditions 
which affect criterion performance, the test cannot have high 
validity. 

Closely related to the problem of prediction is the scope 
or breadth of influence of any given condition upon the individ¬ 
ual’s behavior. For example, the presence of a loud, irregular 

1C C c .g-, i, 4 , 5 , 6, 9 , i°, ii, ia, il, 18, 19. 



THE CONCEPT OF VALIDITY 


73 


noise during the testing would probably affect the score on 
that test, without influencing the individual’s behavior in other 
situations. A toothache or a severe cold on the day of the 
testing would be further illustrations of narrowly limited con¬ 
ditions. In the case of these conditions, the prognostic value 
of the test for the individual would indeed be reduced, in 
much the same manner that holding an ice cube in the mouth 
would invalidate an oral thermometer reading of bodily tem¬ 
perature. Conditions such as language handicap, however, affect 
the individual’s behavior in a much broader area than that 
of the immediate test situation. They may thus influence both 
criterion and test score in a similar manner. 

The import of the above analysis is that validity should 
be consistently interpreted with reference to the specific criteria 
against which the given test was validated. It also follows that 
validity is not a function of the test but of the use to which 
the test is put. A test may have high validity for one criterion 
and low or negligible validity for another. The attitude that 
a good test has “high validity” and a poor test has “low 
validity” is still too prevalent among test users. Tests cannot 
be validated in the abstract, nor is the usual concept of validity 
itself universally applicable to psychological testing. It is only 
when tests are employed for predictive or diagnostic purposes 
that the correlation with an external criterion is relevant at 
all. In many investigations concerned with fundamental be¬ 
havior research, tests are employed merely as behavior samples 
obtained under standardized (i.e., uniform) conditions, without 
reference to the correlations of these samples with other, “ every¬ 
day-life” behavior samples (i.e,, practical criterion measures). 
When the maze-learning behavior of white rats is tested, for 
example, the maze is not first “validated” against the rats’ 
success in finding food in a grocery basement, or their ability 
to avoid contact with prowling cats, or any other criteria of 
achievement in the rats’ extra-laboratory or workaday world. 
The investigator may quite reasonably argue that for the study 
of the particular principles of behavior which he is investigat¬ 
ing, maze^earning is as “good” a sample of behavior as cat¬ 
avoiding, and that he has no more reason for validating the 
former against the latter than vice versa. 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Fundamentally, any validation procedure provides a measure 
of the relationship between two behavior samples. As Guilford 
has recently expressed it, “In a very general sense, a test is 
valid for anything with which it correlates” (7, p. 42,9). The 
process can be regarded as irreversible only when one of the 
behavior samples has greater importance than the other for 
a specific purpose. 2 In such a case, the more important behavior 
sample is designated the “criterion.” No basic difference exists 
between “criteria” on the one hand and "tests” on the other. 
They are merely different samples of behavior whose inter¬ 
relationships permit predictions from one to the other. We 
could predict intelligence test scores from school achievement, 
although the process would be needlessly time-consuming. In 
such a case, the intelligence test scores would constitute the 
criterion, 

The criterion is not intrinsically superior in any sense. It 
is well known, for example, that many commonly used criteria, 
such as school grades or job advancement, may be influenced 
by many factors “extraneous” to the quality of the individual’s 
performance. Yet, if it is our object to predict such criteria, 
with all their irrelevancies and shortcomings, then the correla¬ 
tion of a given test with such criteria is the validity of the 
test in that situation. To be sure, the immediate criterion 
against which a test is validated may itself have been chosen 
as a convenient index or predictor of a broader and less readily 
observable area of behavior. For example, a pilot aptitude 
test may be validated against performance in basic flight train¬ 
ing, the latter being in turn regarded as an approximate index 
of achievement in more advanced training and even possibly 
of ultimate combat performance. Such “successive validation” 
would be quite consistent with the relativity of predictors and 
criteria. It might be noted parenthetically that it is only when 
criterion measures are themselves used as predictors of further 
behavior that one may legitimately speak of the reliability 
and validity of the criterion itself (cf. e.g., 8). 


2 _To be sure, when the relationship between the two variables is curvilinear, pre¬ 
diction, will not be equally accurate in both directions, since i\ xv ^ t\ vx . fn such cases, 
however, there is no a priori reason to expect that the correlation will be any higher 
when predicting the “criterion” from the “test” than when predicting the “test” from 
the “criterion.”- 



THE CONCEPT OF VALIDITY 


75 


Validation against a “practical” criterion is essential for 
many uses to which tests are put. It should not be assumed, 
however, that only tests which have been validated against 
some criterion considered important within a particular cul¬ 
tural setting can be used in behavior research. In order to be 
able to generalize from any obtained test score, we need only 
to know the relationships between the tested behavior in ques¬ 
tion and other behavior samples, none of these behavior samples 
necessarily occupying the preeminent position of a criterion. 
Thus, if the investigator is interested in the possible use of 
maze-learning performance as a basis for predicting the rats’ 
behavior in other learning situations, he will have to correlate 
the subjects’ maze-learning scores with their scores in a variety 
of other learning tasks. If a common factor is identified through 
these different learning scores, the “factorial validity” (7) of 
any one of the tests in predicting that which is common to 
all of them can be determined. On the other hand, if no single 
learning factor is demonstrated, then the area within which 
predictions can be made must be accordingly narrowed to 
fit the confines of whatever common factor does become evident. 
Investigations conducted to date on human subjects, for ex¬ 
ample, have failed to indicate the presence of a common “learn¬ 
ing factor” (10, 'll), and animal studies have revealed even 
greater specificity (cf., e.g., 14, 16, 17). But such specificity, 
if further corroborated, is an empirically observed fact whose 
discovery is useful in its own right in advancing our knowledge 
of behavior; it should not be construed as a weakness of the 
tests. 

Whether we are dealing with common factors and “factorial 
validity” or with “practical validity” in the prediction of every¬ 
day-life criteria, the question of validity concerns essentially 
the interrelationships of behavior samples. In the latter case, 
one sample is represented by the test and another, probably 
much more extensive sample, by the criterion. In the former 
case, the different tests which are correlated constitute the 
behavior samples. Nor should the terminology of factor analysis 
mislead us into the belief that anything external to the tested 
behavior has been identified. The discovery of a “ factor” means 
simply that certain relationships exist between tested behavior 
samples. 



76 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The common misconception that the criterion is in some 
mysterious fashion more basic than the test probably results, 
in part, from the belief that tests measure hypothetical “under¬ 
lying capacities” which are distinguishable from observed be¬ 
havior, Discussions of psychological tests often become hope¬ 
lessly entangled because of the implicit supposition that tests 
can be validated against such underlying capacities as criteria, 
Any operational analysis of actual validation procedures re¬ 
veals the futility and absurdity of such an expectation. 

In this connection we may consider a monograph by Thomas 
(13), which sounds a note of acute pessimism regarding the 
use of mental tests as “instruments of science.” Through a 
careful and systematic logical analysis, the author demon¬ 
strates the fallacies inherent in any attempts to interpret psy¬ 
chological tests as measures of “innate abilities,” hypostatized 
“fundamental human capacities,” and the like. He clearly re¬ 
cognizes that “the methodology of mental testing provides 
no way of operationally defining an ability and a performance 
as distinct. . . entities” (13, p. 75). But, in his final conclusions, 
the author seems to exhibit the same confusions which he had 
previously sought to eliminate. 3 For example, in the attempt 
to evaluate the scientific usefulness of psychological tests, he 
raises such questions as the following: “Do two identical scores 
mean that the same kind and amount of psychological processes 
were employed? Do they mean similar sociological backgrounds 
of experience? Do they mean a qualitatively similar adaptation 
to the immediate test environment? Do they mean that com¬ 
parable amounts of psychic tension were built up or that similar 
amounts of nervous energy were expended?” (13, p. 77). By 
way of reply he adds: “The achievement of such scientific 
meanings as these from the current methodology of mental 
testing is probably too much to expect, for test results at 
present are notoriously ambiguous in what they signify about 
the socio-psychological ingredients of the recorded perform¬ 
ances” (13, p. 77). 


s These confusions in the fundamental argument do not detract from the value 
of certain more specific points discussed in this monograph, such as the limitations of 
ordinal scales, and the concepts of difficulty value and homogeneity in test construction. 
But these problems have also been analyzed by other writers, in a somewhat more 
constructive manner (cf., e.g., 3, 10). 



THE CONCEPT OF VALIDITY 


77 


Two weaknesses are apparent in such an argument. First, 
the testing of behavior is being confused with an analysis of 
the factors which determine behavior. Secondly, despite his 
earlier advocacy of an operational definition of “ability,” the 
author now appears to be chasing the will-o’-the-wisp of “psy¬ 
chological processes” which are distinct from performance. He 
seems thus to be demanding that in order to be proper instru¬ 
ments of science, psychological tests should measure functions 
which by definition fall outside the domain of scientific inquiry! 

In summary, it is urged that test scores be operationally 
defined in terms of empirically demonstrated behavior relation¬ 
ships. If a test has been validated against a practical criterion 
such as school performance, the scores on such a test should 
be consistently defined and treated as predictors of school 
performance rather than as measures of hypostatized and un- 
verifiable “abilities.” It is further pointed out that conditions 
which affect test scores may also affect the criterion, since both 
test scores and criteria are essentially behavior samples. The 
extent or breadth of such influences is a matter for empirical 
determination, rather than for a priori assumption. Moreover, 
the validity of a psychological test should not be confused 
with an analysis of the factors which determine the behavior 
under consideration. Finally, it should be noted that the dis¬ 
tinction between test and criterion is itself merely one of prac¬ 
tical convenience. The scientific use of tests is not predicated 
upon the assumption that criteria are a separate class of phe¬ 
nomena against which all tests must first be validated. Essen¬ 
tially, generalization and prediction in psychology require 
knowledge of the interrelationships of behavior, regardless of 
the situation in which such behavior was observed. 

REFERENCES 

1. Anastasi, A, “The Influence of Practice upon Test Reliability.” 

Journal of Educational Psychology , XXV (1934), 32.1-335. 

2. Anastasi, A. and Foley, J. P., Jr. “A Proposed Reorientation in 

the Heredity-Environment Controversy.” Psychological Re¬ 
view, LV (1948), 239-249. 

3. Coombs, C. H. “Some Hypotheses for the Analysis of Qualitative 

Variables.” Psychological Review , LV (1948), 167-174. 

4. Cronbach, L. J. “Test ‘Reliability’: Its Meaning and Deter¬ 

mination.” Psychometrika , XII (1947), 1-16. 



78 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


5. Dunlap, J. W. “Comparable Tests and Reliability.” Journal of 

Educational Psychology , XXIV (1933), 442-453, 

6. Goodenough, F. L. “A Critical Note on the Use of the Term 

‘Reliability' in Mental Measurement.” Journal of Educa¬ 
tional Psychology, XXVII (1936), 173-178. 

7. Guilford, J. P. “New Standards for Test Evaluation.” Educa¬ 

tional AND PSYCHOLOGICAL MEASUREMENT, VI (1946), 427- 
438 . 

8. Jenkins, J. G. “Validity for What?” Journal of Consulting Psy¬ 

chology, X (1946), 93-98. 

9. Kuhlmann, F. Tests of Mental Development. Minneapolis: Educa¬ 

tional Test Bureau, 1939, 

10. Loevinger, J. “A Systematic Approach to the Construction and 

Evaluation of Tests of Ability.” Psychological Monographs, 
LXI (1947), No. 4. > 

11. Paulsen, C. B. “A Coefficient of Trait Variability.” Psychological 

Bulletin, XXVIII (1931), 218-219. 

12. Skaggs, E. B. “Some Critical Comments on Certain Prevailing 

Concepts Used in Mental Testing." Journal of Applied 
Psychology, XI (1927), 503-508, 

13. Thomas, L. G. “Mental Tests as Instruments of Science.” Psy¬ 

chological Monographs, LIV (1942), No. 3. 

14. Thorndike, R. L. “Organization of Behavior in the Albino Rat.” 

Genetic Psychology Monograph, XVII (1935), No. 1. 

15. Thouless, R. H. “Test Unreliability and Functional Fluctuation,” 

British Journal of Psychology, XXVI (1935-1936), 315-343. 

16. Van Steenberg, N. J, F. “Factors in the Learning Behavior of the 

Albino Rat.” Psychometrika, IV (1939), 179-200. 

17. Vaughn, C. L, “Factors in Rat Learning: An Analysis of the 

Intercorrelations Between 34 Variables.” Psychological Mon¬ 
ographs, XIV (1937), No. 69. 

18. Wherry, R. J, and Gaylord, R. H. “The Concept of Test and Item 

Reliability in Relation to Factor Pattern,” Psychometrika, 
VIII (1943), 247-264. 

19. Woodrow, H. “Quotidian Variability.” Psychological Review, 

XXXIX (1932), 245-256. " 

20. Woodrow, H. “The Relation Between Abilities and Improvement 

with Practice.” Journal of Educational Psychology, XXIX 
(1938), 215-230. 

21. Woodrow, H, “Factors in Improvement with Practice.” Journal 

of Psychology, VII (1939), 55 " 7 °- 



THE LOGIC OF SCALE CONSTRUCTION 1 

EDWARD A. SUCHMAN 
Cornell University 

Most of the classifications used in the course of our daily 
communication with one another are not defined with any great 
exactitude. For ordinary purposes of communication, it is 
usually not necessary to formulate a set of rules to distinguish 
between those things which belong to a certain class and those 
which do not. Agreement as to what constitutes membership 
in a class of objects is common enough to permit understanding 
without resort to explicit classification schemes. People can talk 
and write about “beautiful women,” “successful men,” “good 
books” or “prosperous nations” without bothering to state the 
rules for their classifications. These “loose” classifications con¬ 
stitute an important part of our communicatory system. 

The Need for More Precise Classifications 

To the scientist, however, who must work with these classifi¬ 
cations, such loose usage often proves inadequate. Scientific 
communication demands a more rigorous statement of the bases 
for the classifications used, One of the tasks of the scientist 
becomes the translation of the loose descriptive terminology of 
ordinary social intercourse into the more precise classificatory 
systems of science. To the scientist the statements of Mr. Jones 
to Mr. Smith that, "Mr. Brown is a successful lawyer,” or “Mr. 
Greene is an anti-Semitic person,” or “The United States is a 
prosperous country,” present problems in definition. What is 
meant by “a successful lawyer,” or “an anti-Semitic person,” 
or “a prosperous country”? 

The need for such precise definition becomes apparent in 
ordinary communication when there is a disagreement between 
Mr. Jones and Mr, Smith. This disagreement illustrates the 

1 The author wishes to acknowledge the valuable contributions of Paul F. Lazarsfeld 
and Louis Guttman to the present formulation of the problem. 


79 



8o EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


problem of communicatory classification which the scientist is 
attempting to solve. Two persons who disagree on how to 
classify a third person or object find themselves faced with the 
difficult problem of defining the bases for their classification, 
To reach an agreement they are forced to tighten the loose 
classificatory system which usually suffices when both are in 
agreement. So long as both individuals agree that "Mr. Brown 
is a successful lawyer,” they will feel little need to define what 
they mean by "successful.” However, when a disagreement 
occurs, they are forced to state more precisely what they mean 
by "successful” or "unsuccessful.” This transition from a loose 
classification to a more rigorous classification constitutes one of 
the most important tasks of the social sciences. How can this 
transition be accomplished? 

The Problem of Scale Construction 

The efforts of social scientists to define the meaning of some 
attribute or variable in such a way as to permit the classifica¬ 
tion of persons or objects according to the degree to which that 
attribute is present or absent constitutes the problem of scale 
construction. As stated by Lundberg, there are two principal 
aspects to this problem, "(i) How shall we select the aspects or 
factors of a unit which we deem significant and which are 
therefore to be considered in our scale? (2.) How shall we de¬ 
termine the relative weight to attach to each factor included?” 2 
These problems of item selection and item weights occupy a 
central position in most current methods of scale construction. 

However, we propose to show that in the case of a uni-dimen- 
sional scale these two problems are actually non-existent. The 
theory of "scalability” to be developed is based upon the funda¬ 
mental concept that if an area is uni-dimensional, then (1) any 
series of items selected from that area is interchangeable with 
any other series of items, and (2) any set of weights given to a 
series of items will produce the same rank order of objects or 
individuals as any other set of weights. The problem of scale 
construction, therefore, takes the form of a test for uni-dimen- 

5 Lundberg, George. Social Research. New York: Longmans, Green & Co., 1941. 
p, 2.59. 



LOGIC OF SCALE CONSTRUCTION 


8 l 


sionality, rather than the arbitrary treatment of non-scalable 
data as if it were scalable. 

First, we will deal with the problem of item selection and, 
second, with the problem of item weights. 

“Non-itemized” versus “ Itemized" Classifications 

An attempt to clear up the disagreement between Mr. Jones 
and Mr. Smith discussed above may take two different lines of 
development: (i) The introduction of additional judgments 
from other persons, or (2) the listing of those items which serve 
to characterize the different classes. The first approach, that of 
“non-itemized” judgments or ratings, represents an attempt to 
reach an agreement based upon the opinions of other judges, 
without attempting to characterize or describe further the basis 
for the judges’ ratings. The second approach, that of “itemized” 
classification, requires the listing of a characterizing aggregate 
of items which serves as the basis for the classification to be 
made. 

Let us see how these two approaches would apply to the 
present problem. As an example of the first approach, Mr. 
Jones and Mr, Smith could attempt to settle their disagreement 
as to whether Mr. Brown is a “successful” lawyer by asking a 
group of other people to classify Mr. Brown as “successful” or 
“unsuccessful.” The basis for agreement using this method 
might be the proportion of judges rating Mr. Brown as “success¬ 
ful” or "unsuccessful.” This form of classification we shall call a 
“non-itemized” classification. 

As a second approach, Mr. Jones and Mr. Smith could 
attempt to settle their disagreement by asking each other 
exactly what they mean by “successful” or “unsuccessful.” 
They would probably reply by pointing out certain character¬ 
istics of Mr. Brown which to each of them signify the presence 
or absence of “success,” The classification of “successful” is 
expanded by the introduction of such items as “He has money,” 
or “People listen to what he has to say,” or “He has written 
many books,” and other classificatory items characteristic of 
“success.” As more and more specific items are added to the 
general classification, the loose definition takes on a more pre- 



8 a EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


cise meaning. Specific actions or characteristics of Mr. Brown 
are mentioned which afford the basis for the development of 
classificatory techniques by means of which “successful” people 
can be distinguished from “unsuccessful” people. The basis for 
agreement using this method might lie in the number of char¬ 
acteristics indicative of success which Mr. Brown possesses. 
This form of classification we shall call an “itemized” 
classification. 

Thus, the need for a more precise classification, we have seen, 
can lead to the use of “lion-itemized” judgments or to the use of 
“itemized” aggregates of characterizing attributes. Both meth¬ 
ods are currently being used by social scientists in their attempts 
to classify data. Each method has its own particular set of 
problems. The use of “non-itemized” judgments presents a 
solution to the problem based upon ratings without any attempt 
to produce a definition of the variable. The use of “itemized 
aggregates of attributes, on the other hand, attempts a solution 
to the problem based upon a meaningful definition of the 
variable. It is this latter method which will constitute the main 
focus of the present attempt to arrive at a logical basis for scale 
construction. 

Let us look at the first question, “blow shall we select the 
aspects or factors to be considered in our scale?” 

The Concept of an 11 Itemized" Aggregate 

An aggregate of items consists of a series of items which have 
been selected as characterizing some object or person. These 
characterizing items, as we shall see, form the basis for a 
rigorous system of measurement. The transition from a loose to 
a more precise classification, which is the task of the scientist, 
is accomplished through the organization of these characterizing 
items into coherent systems. 

The number of characterizing items that exist for any single 
variable is unlimited, Furthermore, there appears to be little 
inherent reason why any one item is better than any other. 
“Success” may be defined in any number of different ways. 
Theoretically there are an infinite number of classificatory items 
which may be used to distinguish a “successful” from an “un¬ 
successful” person, no single one of which is inherently better 



LOGIC OF SCALE CONSTRUCTION 


83 


y other. How can such an infinitely broad range of 
irizing items be brought into the reach of the scientist 
iires to study them? 

concept of a universe of items can be illustrated by 
s from many different types of social phenomena. The 
:tion of an index of purchasing power may include 
my sampling of characterizing items which come from 
1 universe of items characteristic of purchasing power, 
isification of individuals according to social status may 
a large group of characterizing items ranging from in- 
the number of books read. The judgment of individuals 
ig to their ability to supervise men may include such 
items as the amount of time spent talking to the men 
score received on an intelligence test. The intelligence 
If is composed of a wide range of items. The ranking of 
Lccording to their attitude toward some issue is based 
Leir responses to a series of attitude items. All of the 
reas are characterized by the use of a wide range of 
an attempt to arrive at a more precise classification of 
eas. Another way of stating this would be to say that 
apt is made to classify social phenomena by observing a 
of items which come from a universe of items char- 
ic of these phenomena. 

Sampling a Universe of Items 

ow come to an important aspect of this concept of a 
; of items—the sampling of items from this universe. 
1 unlimited number of items can be used to characterize 
concept , any definite number of items that are used 
a sample from this unlimited universe. Any single item 
ised in practice is but a sample of one from this universe, 
nterchangeable with any other item from the universe 
ght have been used in its place. The items used in a 
attitudes toward war, an intelligence test, asocial-status 
rating sheet on efficiency of workers, a standard of 
adex, a personality inventory, or in any classification 
n the social sciences are only a selection from an in- 
large number of similar items. Thus the practical prob- 
lassification in the social sciences becomes one of study- 



84 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

ing a universe on the basis of a sampling of items from that 
universe. 

The concept of an aggregate of characterizing items, thus, 
conceives of a sample from an unlimited number of items which 
may be used to characterize any social phenomenon. The char¬ 
acterizing universe consists of all items which can be used to 
exemplify the social concept. The determination of whether or 
not an item belongs to a certain universe, however, remains a 
matter which must be decided upon by common agreement. 
A characterizing item belongs to a universe on the basis of some 
arbitrary decision as to its content. The universe itself is de¬ 
cided upon arbitrarily as the content of interest to the investi¬ 
gator. Some additional means, such as the consensus of judges, 
might be introduced to help the investigator, but the final 
decision of whether or not this item characterizes the universe 
or phenomenon of interest, must be a subjective one. 

As will be discussed in the next section, a test of scalability 
can help one to eliminate certain obvious cases of misinterpre¬ 
tation of the meaning of an item. But such ex ■postfacto ration¬ 
alizations are to be rigorously avoided. If the decision is made 
that this particular series of items represents the universe of 
interest, then eliminating items must result in a redefinition of 
one’s interests. Whether or not an item belongs to the universe 
must not be a decision based upon some “correlational” test— 
there must be an adequate “content” interpretation for both 
acceptance and rejection. 

Our answer to the problem of which factors to consider in a 
scale, therefore, is that one must first define the universe in 
which one is interested. This definition of the universe is a sub¬ 
jective one and consists of the listing of characterizing aggre¬ 
gates of items. The actual series of items that one uses in 
practice can be conceived of as a sample of items from the un¬ 
limited number that exists in the universe of content. The prob¬ 
lem now becomes one of determining how valid a representation 
of the total universe the selected sample is. The answer to this 
problem depends upon the determination of the dimensionality 
of the universe. Does the universe consist of a single dimension? 
To answer this question, we turn next to a consideration of 
“dimensionality.” 



LOGIC OF SCALE CONSTRUCTION 


8 5 


<The Concept of a “ Uni-dimensional” Aggregate of Items 3 

Let us assume that we now have a tentative set of char¬ 
acterizing items to be used for the classiffication of some social 
phenomenon. What are the different patterns of inter-relation¬ 
ships which these items can assume and of what importance are 
these patterns to the problem of scale construction? 

As an example of what might occur in the way of inter¬ 
relationships, let us start out with a simple case of three items 
only. Suppose, for example, in the previous problem of classify¬ 
ing individuals according to how successful they are as lawyers, 
we had decided to use the following three items: 

1. Did he have an income of over $15,000 a year? 

a. Was he the author of any books on law? 

3. Had he ever received any honors from the bar association? 

Suppose further that each item had been answered either 
“yes” or “no.” 

Conceivably then we might have the following eight types 
occurring among the lawyers whom we are interested in 
classifying: 



Item 

Item 

Item 


1 

1 

3 

Type 

(Money) 

(Books) 

(Honors) 

1 

Yes 

Yes 

Yes 

2 

Yes 

Yes 

No 

3 

Yes 

No 

Yes 

4 

No 

Yes 

Yes 

5 

No 

No 

Yes 

6 

No 

Yes 

No 

7 

Yes 

No 

No 

8 

No 

No 

No 


We are now faced with the problem of ordering the above 
eight types according to how successful each type is as a lawyer. 
Types 1 and 8 give us no trouble; type 1, possessing all three of 
the characterizing items of success, is most successful; and type 8, 
possessing none of the characterizing items, is least successful. 
However, we find that types 1, 3 and 4 each possess two of the 
characterizing items of success. How are we to rank these 
three types relative to each other ? Should we give least weight 
to “honors,” and rank type 1 above types 3 and. 4, or should we 


This concept of a ''uni-dimensional” universe has been derived from the theory 
of scaling developed by Louis Guttman. See ‘'A Basis for the Scaling of Qualitative 
Data,” American Sociological Review, IX (1944), 139-150. 



86 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


give least weight to “books,” and thus rank type 3 above 
types 1 and 4? The same problem of weighting applies to types 
5, 6 and 7 each of which possesses one of the characterizing 
items. We are faced with the need to make some decision as to 
how much weight to assign to each of the characterizing items, 
We shall therefore call any aggregate of items with the above 
pattern of inter-relationship, aggregates which present a prob¬ 
lem of relative weights. Rank order for such a pattern cannot be 
determined without assigning weights to the different items. 
Furthermore, depending upon the relative weights assigned, 
this rank order can vary with different sets of weights. This we 
recognize as the second problem of scale construction—how 
much weight to give to each item. 

We now come to an important question, “Are there any 
aggregates of items which do not present a problem of relative 
weights?” It is to be expected that an affirmative answer to this 
question would depend upon our ability to find an aggregate of 
items which formed a rather special pattern of inter¬ 
relationships. 

Let us illustrate one such pattern by means of the previous 
example. Suppose we found that the relationship between the 
three characterizing items was such that only four out of the 
eight possible types actually occurred. There would be (a) the 
type that possessed all three characteristics, (b) the type that 
possessed characteristics 2 and 3 only, (c) the type that pos¬ 
sessed characteristic 3 only, and finally (d) the type that 
possessed none of the characteristics. In other words, only types 
1, 4, 5 and 8, as listed above, would be found to occur in 
actuality. 

Let us repeat this listing of types including only the above 
four types. 


Type 

(Money) 

(Hooka) 

(Honors) 

I 

Yes 

Yes 

Yes 

4 

No 

Yes 

Yes 

5 

No 

No 

Yes 

8 

No 

No 

No 


Under what conditions could we expect the occurrence of 
only the above four types? The answer to this question is found 



LOGIC OF SCALE CONSTRUCTION 


87 


in the pattern of inter-relationship between the items. First, we 
find that the types can be ordered, depending upon the num¬ 
ber of characteristics each type possesses. No two types have the 
same number of characteristics. Second, we find that the items or 
characteristics can be ordered, depending upon the number of 
types that possess that characteristic. No two items arc possessed 
by the same number of types. 

The result of this ordering process of characteristics and 
types produces a definite pattern of inter-relationship. This 
pattern can be easily recognized if we separate the possession 
of a characteristic from its absence, and then order both char¬ 
acteristics and types according to frequency of occurrence. 
The result of such an ordering process is a parallelogram. 

This pattern could be represented as follows: 





Docs N ol 

Did Not 

Has Not 

Has 

Wrolc 

Received 

1 111 VC 

Write. 

Received 

Money 

Books 

Honors 

Money 

Books 

Honors 

X 

X 

X 





X 

X 

X 





X 

X 

X 



where an X represents the characteristics of each type. A 
parallelogram pattern such as the above offers no problem in 
weights. No matter what weights were given to each of the 
items, the rank order of Types 1,4, 5, and 8 would be the same, 
because each type possesses all of the characteristics of the 
type below it, and one more in addition. The rationale for such 
a pattern will become clearer after the following discussion. 

Another method of deriving this special pattern of relation¬ 
ship between characterizing items which do not present a prob¬ 
lem of weights would be by means of cross-tabulation. What 
form must a cross-tabulation between two items take in order 
for a rank order based upon these items to be independent of 
any weights the items might be assigned? One form, of course, 
would occur if these two items were perfectly correlated. There 
would be only two types of individuals in such a case—those 
with both characteristics and those with neither characteristic. 

This perfect correlation may be represented by a fourfold 



88 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


table as follows: 


Item 2 


Item I 



+ 

- 


N 

o 

— 

O 

N 


where + indicates the presence of that characteristic and — in¬ 
dicates its absence. On the basis of this type of relationship 
between items, all of the individuals in the + + cell may be 
ranked above those in the — — cell. No matter what weights 
are given to the items, the rank order will remain the same. 

A second possibility is that individuals may fall into three of 
the cells of such a fourfold table, as follows: 


Item a 


Item J 



+ 

- 

+ 

N 

N 

- 

° 

N 


Here again there is no problem of relative weights to be assigned 
the two items. Those individuals in the + + cell would receive 
the highest rank order, those individuals in the — — cell would 
receive the lowest rank order, while the only other group in the 
+ — cell would fall in between the highest and the lowest 
ranks. Again no matter what weights were given to the items, 
the rank order would remain the same, 

Finally, a third possibility is that individuals would fall into 
all four cells of the table, as follows: 

Item I 



+ 

- 

+ 

N 

N 


N 

N 


While there is no problem of ranking in relation to the 4—1- cell 
and the — — cell, we find that how the individuals in the other 
two cells were ranked would be completely dependent upon the 
relative weights given to the two items. Here, then, we have the 



LOGIC OF SCALE CONSTRUCTION 89 

problem of the relative importance of items or the problem of 
weighting in scale construction. 

A cross-tabulation between any two dichotomous items in a 
series, therefore, must have the following characteristic in order 
for rank order to be independent of item weights; all cross¬ 
tabulations between the items should result in the absence of 
any cases in one of the cells which represents a “positive" 
answer on one item and a “negative” answer on another item. 
This zero-cell, furthermore, must occur in the column which 
contains the lowest positive frequency. For example, a cross¬ 
tabulation between items a and 3 of the previous example would 
have to look as follows: 

Item 1 (Books) 




Yes 

No 

Item 3 
(Honors) 

Yes 

N 

N 


No 

0 

N 


There should be no individuals who have written books, but 
who have not received any honors. Any characterizing item 
that is the property of a lower rank must also be the property of 
all higher ranks, while the lower rank must lack the distinguish¬ 
ing characterizing item of the upper rank. Thus, since “honors" 
is a characteristic of Type 5, it must also be a characteristic of 
Type 4 (a higher rank), but Type 5 in turn must lack the dis¬ 
tinguishing characteristic of the higher rank, in this case 
“books.” 

The parallelogram pattern which permits the determination 
of a rank order without presenting the need for assigning arbi¬ 
trary weights to the various items will be called a uni-dimen¬ 
sional pattern. Such a uni-dimensional pattern can be de¬ 
termined empirically, first by ordering items according to 
ascending order of positive frequencies, i.e., “money” is a char¬ 
acteristic of fewest lawyers, and is therefore placed before 
“books” which in turn precedes “honors," and then by ordering 
individuals according to the number of characterizing items 
they possess. If, as a result of this ordering of items and indi¬ 
viduals, the aggregate of items with which one is dealing forms 



go EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

a parallelogram pattern then we can proceed to classify indi¬ 
viduals according to a rank order which is independent of any 
weights which the items might be given. Such a rank order has 
the property of permitting one to derive from the rank order 
the exact characteristics of the individuals in that rank—since 
there is only one possible combination of items for any single 
rank order. Furthermore, the rank order has the quality that 
any individuals in a higher rank possess all the characteristics of 
the individuals in a lower rank, and at least one more in addi¬ 
tion. This property of reproducibility of characteristics from a 
knowledge of rank order can only be present where the aggregate 
of characterizing items does not present a problem of relative 
weighting. It permits a more clear-cut rationale for ranking 
individuals along a single continuum than is possible when the 
rank order must be based upon an arbitrary decision of how 
much weight to assign each item. 4 

The aggregate of items which permit such a rank order which 
is independent of item weights will be called a “scale" and the 
universe of which the items are a sample will be called a scalable 
universe. Since the universe is scalable, any selection of items 
from that universe would result in the same rank order of ob¬ 
jects or persons as any other selection. A scale in the present 
usage is therefore an aggregate of items which are so inter¬ 
related as to offer no problem of relative weighting. 6 

A test of “ single meaning" 

To a limited extent, scale analysis can be used as a test of the 
"meaning’ 1 of items in an effort to eliminate items which do not 
belong to the scalable universe. However, there must be an 
adequate "content" reason in addition to the "correlational” 
analysis. In many cases, the correct decision would be to label 
one’s universe of interest as multi-dimensional, and therefore 

4 Simple techniques for testing a series of items for unidimensionality based upon 
the determination of whether or not a parallelogram pattern exists have been de¬ 
veloped. See, for example, Guttman, L., “The Cornell Technique for Scale and Intensity 
Analysis,” Educational and Psychological Measurement, VII (19+7), 247-280. 

6 It is important to remember that many universes will be found to present a 
problem of weighting Constituent items and that much work remains to be done in 
solving the problem of classification for such areas. Scaling is not a solution to the 
problem of weighting, but rather a selection of areas which do not present a prob¬ 
lem of weighting. 



LOGIC OF SCALE CONSTRUCTION 91 

not scalable, rather than to attempt to tease out a scalable sub¬ 
group of items which no longer reflects the desired universe. 

Let us illustrate this problem of “meaning” by means of an 
example. Suppose, in the previous example of the classification 
of lawyers according to “success,” the item, “Does he have an 
income of over $15,000 a year?” had been asked, instead, as, 
“Does he have an income of over $15,000 a year which he has 
earned honestly?” Whereas an answer of "No” to the former 
wording of the question has a clear-cut single meaning, the 
answer of “No” to the latter wording may mean either that he 
does not have a high income or that he has a high income but, 
in the opinion of the respondent, he has not earned it honestly. 
The response to this latter question depends upon the aspect or 
element upon which the subject focuses. The question can have 
more than one interpretation. Such questions have been called 
“double-barrelled,” and their use for classification purposes is 
limited by the fact that different subjects may be responding to 
different aspects of the question. 

While the presence of double-meaning is relatively easy to 
determine in the case of a single question, there is another type 
of double-meaning which is not so easily detected. As was 
discussed in the first section, the study of social phenomena 
involves the sampling of items from a whole universe of items 
characteristic of those phenomena. This use of an aggregate of 
items permits the occurence of a new type of double-meaning— 
different meanings for the different items in the aggregate. This 
problem is quite different from that of double-meaning in the 
single question, as can be illustrated by the following example. 

Suppose, in the selection of items characterizing a successful 
lawyer, we had carefully avoided any single items with possible 
double-meanings. But we now add a fourth item, “Does he 
have children?” We now have the following list of questions: 

1. Does he have an income of over $25,000 a year?” 

2. Has he written books? 

3. Has he received any honors? 

4. Does he have children? 

Let us assume that there are no double-meanings in any single 
one of the above questions. However, a new problem arises. 
This problem may be stated as, “Do all of the above questions 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


deal with the same topic?” This problem is different from the 
previous one stated as, “Does this question call for a response 
dealing with a single topic?” The new problem is one of de¬ 
termining the single meaning of a series of questions, each of 
which has been judged individually to contain only a single 
meaning. In other words, we must now determine whether or 
not the single topic studied by each of the items is the same 
single topic for all of the items. The problem of meaning for a 
single question is, “Does the individual question produce a 
response to only a single topic?”, while the problem of meaning 
for an aggregate of questions is, “Is the single topic studied by 
each of the questions the same for each question?” 6 

The proposed parallelogram test for uni-dimensionality would 
serve to indicate in a series of scalable items whether or not any 
of the items did not deal with the same dimension indicated 
by a large majority of the items. Such double-barrelled items 
as “Does he have an income of over $25,000 a year which he 
has earned honestly?” or such extra-dimensional items as “Does 
he have children?” would not conform to the parallelogram 
pattern. 

Summary 

The problem of scale construction has often been stated as 
involving (1) the problem of item selection and (2) the problem 
ot item weights. The present paper offers a logical system for 
scale construction which answers these two problems in terms 
of a test for uni-dimensionality. Any series of items used in a 
scale can be conceived of as a sample of Items from an unlimited 
universe of items dealing with the variable being studied. If a 
test of the inter-relationships of these items shows them to con¬ 
form to a defined parallelogram pattern, then the rank order of 
objects or individuals based upon these items will be independ¬ 
ent of item weights. Furthermore, in such a case, the rank order 
will pertain to the entire universe of items and any selection of 
items from that universe will produce the same rank order as 
any other selection. 

6 This question of meaning, of course, could be stated the same for both single 
items and aggregates of items as follows, “Is a single topic only being studied?” The 
present formulation, however, is important for an understanding of the methods used 
to answer this question for a series of items. 



LOGIC OF SCALE CONSTRUCTION 


93 


Thus, according to this approach, the problem of scale con¬ 
struction becomes a problem of testing a series ot items for uni- 
dimensionality, If the items conform to the prescribed scale 
pattern, then the problems of item selection and item weights 
are non-existent, This approach therefore involves a test for 
scalability in the area of interest, rather than the construction 
of some arbitrary scoring scheme. In this sense, scales can only 
k constructed for uni-dimensional variables. If the underlying 
variable is shown to be uni-dimensional, the rank order of 
objects or persons is independent of item selection and item 
weighting, If the underlying variable is shown to be multi¬ 
dimensional, then a meaningful single rank order is impossible, 
It is the task of the research worker, therefore, first, to define 
his area of interest by listing those items which characterize the 
universe in which he is interested, and, second, to test these 
items for uni-dimensionality. If the test shows that the universe 
is not uni-dimensional, then he cannot construct a meaningful 
scale by arbitrary decisions of item selection and item weight¬ 
ing, If the test shows that the universe is uni-dimensional, 
then the problems of item selection and item weights are non¬ 
existent. 



VALIDITY, RELIABILITY, AND BALONEY 1 

EDWARD E. CURETON 
University of Tennessee 

It is a generally accepted principle that if a test has demon¬ 
strated validity for some given purpose, considerations of relia¬ 
bility are secondary. The statistical literature also informs us 
that a validity coefficient cannot exceed the square root of the 
reliability coefficient of either the predictor or the criterion. This 
paper describes the construction and validation of a new test 
which seems to call in question these accepted principles. Since 
the technique of validation is the crucial point, I shall discuss 
the validation procedures before describing the test in detail. 

Briefly, the test uses a new type of projective technique which 
appears to reveal controllable variations in psychokinetic force 
as applied in certain particular situations, In the present study 
the criterion is college scholarship, as given by the usual grade- 
point average, The subjects were 29 senior and graduate stu¬ 
dents in a course in Psychological Measurements. These stu¬ 
dents took Forms Q and R of the Cooperative Vocabulary Zest, 
Form R being administered about two weeks after Form Q. 
The correlation between grade-point average and the combined 
score on both forms of this test was .23. The reliability of the 
test, estimated by the Spearman-Brown formula from the corre¬ 
lation between the two forms, was ,90. 

The experimental form of the new test, which I have termed 
the “B —Projective Psychokinesis Test,” or Test B, was also 
applied to the group. This experimental form contained 85 
items, and there was a reaction to every item for every student, 
The items called for unequivocal '‘plus” or “minus” reactions, 
but in advance of data there is no way to tell which reaction to 
a given item may be valid for any particular purpose. In this 

’■This paper was presented in Denver, Colorado, September 7, 1949, at a meeting 
sponsored jointly by the Division on Evaluation and Measurement of the American 
Psychological Association and the Psychometric Society. 


94 



VALIDITY, RELIABILITY, AND BALONEY 


95 


respect Test B is much like many well-known interest and per¬ 
sonality inventories. Since there were no intermediate reac¬ 
tions, all scoring was based on the “plus” reactions alone. 

I first obtained the mean grade-point average of all the stu¬ 
dents whose reaction to each item was “plus.” Instead of using 
the usual technique of biserial correlation, however, I used an 
item-validity index based on the significance of the difference 
between the mean grade-point average of die whole group, and 
the mean grade-point average of those who gave the “plus” 
reaction to any particular item. This is a straightforward case 
of sampling from a finite universe. The mean and standard 
deviation of the grade-point averages of the entire group of 
29 are the known parameters. The null hypothesis to be tested 
is the hypothesis that the subgroup giving the “plus” reaction 
to any item is a random sample from this population. The mean 
number giving the “plus” reaction to any item was 14.6. I 
therefore computed the standard error of the mean for independ¬ 
ent samples of 14.6 drawn from a universe of 2.9, with replace¬ 
ment. If the mean grade-point average of those giving the 
“plus” reaction to any particular item was more than one stand¬ 
ard error above the mean of the whole 69, die item was retained 
with a scoring weight of plus one. If it was more than one stand¬ 
ard error below this general mean, the item was retained with a 
scoring weight of minus one. 

By this procedure, 9 positively weighted items and 15 nega¬ 
tively weighted items were obtained. A scoring key for all 24 
selected items was prepared, and the “plus” reactions for the 
29 students were scored with this key. The correlations between 
the 29 scores on the revised Test B and the grade-point aver¬ 
ages was found to be .82. In comparison with the Vocabulary 
Test, which correlated only .23 with the same criterion, Test B 
appears to possess considerable promise as a predictor of college 
scholarship. However, the authors of many interest and per¬ 
sonality tests, who have used essentially similar validation 
techniques, have warned us to interpret high validity coef¬ 
ficients with caution when they are derived from the same data 
used in making the item analysis. 

The correlation between Test B and the Vocabulary Test 
was .31, which is .08 higher than the correlation between the 



96 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Vocabulary Test and the grade-point averages. On the other 
hand, the reliability of Test B, by the Kuder-Richardson For¬ 
mula ao,was -.06. Hence it would appear that the accepted 
principles previously mentioned are called in question rather 
severely by the findings of this study. The difficulty may be 
explained, however, by a consideration of the structure of the 
B-Projective Psychokinesis Test. 

The items of Test B consisted of 85 metal-rimmed labelling 
tags. Each tag bore an item number, from 1 to 85, on one side 
only. To derive a score for any given student, I first put the 85 
tags in a cocktail shaker and shook them up thoroughly. Then 
I looked at the student's grade-point average. If it was B or 
above, I projected into the cocktail shaker a wish that the stu¬ 
dent should receive a high "plus” reaction score. If his grade- 
point average was below B, I projected a wish that he should re¬ 
ceive a low score. Then I threw the tags on the table. To obtain 
the student's score, I counted as "plus” reactions all the tags 
which lit with the numbered side up. The derivation of the term 
"B- Projective Psychokinesis Test” should now be obvious, 

The moral of this story, I think, is clear. When a validity 
coefficient is computed from the same data used in making an 
item analysis, this coefficient cannot be interpreted uncritically, 
And, contrary to many statements in the literature, it cannot be 
interpreted "with caution” either. There is one clear interpre¬ 
tation for all such validity coefficients. This interpretation is— 


"Baloney I” 



RESPONSE SETS: A NOTE ON CONSISTENCY 
IN TAKING EXTREME POSITIONS 

EDWARD A. RUNDQUIST 
Owens-Illinois Glim Company 

“A response set is ... any tendency causing a person to 
give different responses to test items than he would when the 
same content was presented in different form.” Thus Cronbach 
(i) defines response sets in a recent summary of the wide range 
of situations in which such sets have been found. 

In personality testing, response sets can be deliberate at¬ 
tempts to deceive, reflections of basic drives or traits, reflec¬ 
tions of a particular frame of reference, or a temporary set 
brought about by a particular way of interpreting the direc¬ 
tions. On just what a response set reflects and how consistently 
it reflects it, will depend the importance that is attributed to it. 
If a response set is transient and dependent primarily on the 
given conditions of an immediate situation, interest will be 
confined to controlling its influence so it does not interfere 
with the interpretation of test results. If, however, it influences 
behavior in a variety of situations over a long period of time, 
it would be worthy of careful study as a means of personality 
measurement. 

Among other response sets noted by Cronbach, is the tend¬ 
ency to take the extreme positions on scales of the Like — 
Indifferent—Dislike or the Agree — Undecided—Disagree type. 
This note reports on the consistency of this tendency in two 
situations, one immediately following the other. In the first, 
in factory girls, all doing the same work, describe themselves 
by indicating how well each of aoo descriptive words and 
phrases apply to them; in the second, how well they liked or 
disliked each of ioo activities. 

As Cronbach notes, to measure the consistency of any re¬ 
sponse set, the situations involved must allow equal oppor¬ 
tunity for it to be called out, i.e., the situations must be equally 


97 



98 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


indefinite or unstructured. Whether the personality and in¬ 
terest items with their respective directions provide this may 
be judged from the material appended to this note. To the 
writer there seems at least approximately equal opportunity 
for the set to take extremes to operate. The fact that the two 
forms were presented in immediate succession, the personality 
form first, would increase the likelihood for the same set to 
operate while taking both forms. 

Substantial individual differences exist in the tendency to 
take the extreme position. The mean and sigma for the aoo 
personality items are 74.95 and 37.28; for the 100 interest items, 
37.47 and 14.24. To obtain these scores, the number of A and 
E responses (see key at end of paper) for each series were 
summed. 

The correlation between this tendency on the two series of 
items is .40. This is significantly different from zero. (Sigma of 
an r of zero with an N of 111 is .1.) 

There is, then, a real tendency for those who take extreme 
positions in describing their traits to take extreme positions 
in describing their interest, On the basis of a consistency repre¬ 
sented by a correlation of .4 in two similar and immediately 
successive situations, it is hard to believe this particular re¬ 
sponse set is reflecting anything basic about the individual. 
It seems rather that it is largely a function of the type of ma¬ 
terial, interpretation of directions, mood, or some other tem¬ 
porary condition. At least with a consistency of ,4, we would 
not expect a measure to be very useful in predicting a criterion 
such as behavior on a job. With personality and interest items 
of the kind dealt with here, it would seem more profitable to 
eliminate the operation of this response set rather than to at¬ 
tempt to use it as a measure. 

Directions for Personality Items 

On the following pages are words and phrases used in de¬ 
scribing people, Yomare to describe yourself by indicating how 
well each description applies to you. Use the following key: 

Key; A. Describes me perfectly or almost perfectly. 

B. Describesjme unusually well. 

C. Describes me fairly well. 

D. Describes me some but not very well. 

E. Describes me slightly or not at all. 



RESPONSE SETS 


99 


Indicate your answer by putting the letter that applies on the 
line in front of each description. Suppose the word is “helpful.” 

If you feel it describes you fairly well, you would place a C 
on the line before it, thus: 

C Helpful 

If you feel that this word describes you some but not 
very well, you would place a D on the line thus 

D Helpful 

If you feel that the word describes you perfectly or almost 
perfectly, you would put an A on the line, thus: 

A Fie Ip fu l 

Look at each word or phrase and decide how well it describes 
you. Do not worry about being consistent but consider each 
description by itself. Do not skip any. 
i. Cheerful 24. Have high ideals 

4, Know my own mind 28. Stubborn 

8. Cooperative (like to help 41. Like to be different 

people) 44. Jealous 

13. Restless (never still a 46. Always on time 

minute) 

14. Worry about the future 

Directions for Interest Items 

On the following page is a list of activities. Indicate how 
much you like or dislike each one. Use this key in indicating 
how much you like or dislike it. 

Key: A. Like a great deal 

B. Like some 

C. Neither like nor dislike 

D. Dislike some 

E. Dislike a great deal 

You may not have done all the activities listed. Further, 
some require training which you may not have had. For these 
indicate how much you think you would like them if you 
tried them and if you had the proper training, Answer every 
item. Give your first reactions. Work rapidly. 

3. Work around machinery 19. Trying out new cooking 

4. Arrange flowers recipes 

8. Teach English 28. Read a book 

10. Soft and slow music 30. Tidy up the house 

16. Visit a canning factory 38. Look up words in a 
18. Go to parties often dictionary 

REFERENCE 

1. Cronbach, L, S. “Response Sets and Test Validity.” Educa¬ 
tional and Psychological Measurement, VI (1946), 
475 - 494 - 



THE INTERESTS OF ART STUDENTS 


WALTER R. BORG 
University of Texas 

Introduction 

The aim of this study is to attempt to answer the following 
questions: 

i. Does the Kuder Preference Record differentiate an art 
group from general-population samples with respect to 
art-interest scores? 

a. Is success in art courses significantly related to areas of 
interests as measured by the Kuder Preference Record ? 

3. Do Kuder profiles of groups of students specializing in 
different areas of art differ significantly? 

Preliminary. Study 

The Strong Vocational Interest Blank was used by the in¬ 
vestigator in a preliminary study of 85 upper division art-col¬ 
lege students and was not found to be useful in differentiating 
levels of artistic ability or revealing individual art interests, al¬ 
though, as a group, the art students studied were above the 
norms for non-art groups. Only 38 per cent of the art group 
studied in the preliminary investigation received “A” ratings in 
art interest although this group was made up entirely of ad¬ 
vanced art students. Seventy-two per cent of Strong’s criterion 
group, consisting of 124 painters, 79 commercial artists, 20 
sculptors, and 9 cartoonists, made “A” art ratings on the Voca¬ 
tional Interest Blank. 1 The mean score for this group is given as 
176.80 with a standard deviation of 88.o8 z . The mean for the 
group tested in the preliminary study conducted by the author 
was 86 with a sigma of 100. Great differences between, the 
general makeup of Strong’s criterion group and the art-college 
students tested possibly account for the differences in score. For 

1 Strong, E, K. Vocational Interests of Men and Women. Stanford Univ.: Stanford 
University Press, 1943. Page 730. 

“Taken from norms supplied with artist scoring key,for Strong test. 

ICO 



THE INTERESTS OF ART STUDENTS 


IOI 


example, the average age of the criterion group is given by- 
Strong as 42.7 years, with average education 11.9 grade, in¬ 
dicating a much more mature and somewhat less-educated 
group than the advanced art students tested. 3 Because of the 
above findings, it was decided to use the Kuder Preference 
Record , Form BB, in the present study. 

Present Study 

A total of 427 students at the California College of Arts and 
Crafts at Oakland, California, were used as subjects in this 
study. Of this group 299 were men (median CA 22-8), and 128 
were women (median CA 19-7). Only students having com¬ 
pleted nine or more semester units of art work at the school 
were studied. 

Grade averages in art courses were used as tine criterion for 
art-college success. Reliability of art grades was computed by 
comparing first-semester grade averages with subsequent grades 
of 92 students having completed more than 45 semester units 
of art work, A correlation of .84 was obtained, thus indicating 
that art-course grades in this college are reasonably reliable. 

The Kuder Preference Record is scored for nine areas of in¬ 
terest: (1) Mechanical, (2) Computational, (3) Scientific (4) 
Persuasive, (5) Artistic, (6) Literary, (7) Musical, (8) Social 
Service, and (9) Clerical. As each response constitutes a choice 
of one area over two others, a picture of relative interest is 
given and not absolute interest, as is the case with the Strong 
test. The Kuder test has several advantages for research. Prob¬ 
ably most important is the ease of scoring and the possibility 
of analyzing the scores. It is also comparatively easy to con¬ 
struct norms for selected groups when using the Kuder test and 
this was considered to be a useful undertaking as norms given 
by Kuder for art students and artists are not as complete as 
could be desired. 


Results 

Scores earned on area five of the Kuder Preference Record 
are intended to indicate interest in art. The mean scores of the 
427 students in the art group used in this study, are closely in 

a Strong, lac. cit. 



102 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


agreement with the twenty-one cases given in the test Manual. 
Kuder gives a mean score in art interest of 85.07, which equals 
a percentile score of 96. Men in the art group studied had a 
mean score of 87.50 (99th percentile), with the women’s mean 
being 85.52 (96th percentile) or almost identical to the women 
artists studied by Kuder. The art group showed considerably 
less variability than Kuder’s norm group, the sigmas being 
9,20 and 10,82 respectively. 4 

The correlation between interest scores on the Kuder art 
scale and art-course grade-point average was found to be only 

TABLE 1 


Summary of Scores of Furious Croups of the Kuder Art Scale 


Group 

n 

Mean 

Sigma 

Women Artists nnd Art Teachers . .. 

. . . . 11 

85.07 

10.82 

Total Art Group studied 

Men. 

299 

87.50 

9.20 

Women. 

128 

85.52 

10.82 

Upper 27% of Art Group. 

115 

88.26 

6.85 

Lower 27% of Art Group. 

115 

85.91 

8.50 

Fine Arts Group. 

57 

86.87 

10.10 

Commercial Art Group. 

,... 276 

87, u 

9.12 

ArtTeacher Group. 

1M 

86.47 

9.07 


.08, which is not statistically significant (/ equals 1.66). Com¬ 
parison of the upper and lower 27 per cent of the subjects with 
respect to grade-point average in art courses revealed a small 
and significant difference between means. The mean of the 
upper group was 88.26, the lower group 85.91 while the critical 
ratio of the difference was 2.32. The difference in scores between 
men and women was also slightly significant in favor of the 
men, the critical ratio also being 2.32. 

In comparing three groups of students specializing in dif¬ 
ferent areas of art at the California College of Arts and Crafts, 
no significant differences were found in their scores on the Kuder 
art-interest score. The means for the commercial art students, 
fine arts students, and art-teaching students were 87.11, 86.83, 
and 86.47 respectively. This places the means of all three groups 
between the 97th and 98th percentiles in art interest. Scores on 
the Kuder Art Scale for the various groups tested may be found 
in Table 1. It may be concluded that the Kuder Art Scale is 

4 Kuder, G. F. Revised Manual for the Kuder Rrejerence Record. Chicago: Science 
Research Associates, 1948. Page 12. 













THE INTERESTS OF ART STUDENTS I03 

valuable in differentiating the art group from the general popu¬ 
lation. The low correlation with art grades indicate that it is 
not useful as an indicator of the degree of talent within an art 
group. This correlation would probably be much higher in a 
more heterogeneous gi-oup. 

In addition to an analysis of the performance of the art group 
as a whole, it was decided to compare the performance of the 
three art-area groups of commercial art students, fine arts 
students, and art-teaching students in detail and construct 
profiles for them. It was considered most practical to first study 
these three area groups without regard for sex because of the 
small number of cases. Thus, the groups were first considered 
as a whole, and then the commercial art groups and the men's 
teaching group which contain sufficient cases were studied with 
respect to sex. 

Table 2 gives scores of the three art-area groups on the nine 
Kuder Scales. It will be noted that there are no significant dif¬ 
ferences among the three groups in art interest, all scoring above 
the 95th percentile for both men’s and women’s norms. The 
commercial art group scored significantly higher than the other 
two groups in mechanical interest, it was superior to the teach¬ 
ing group in scientific and clerical interest, and was superior to 
the fine arts group in persuasive interest. 

Table 3 shows a comparison between men's and women’s 
raw scores and percentile scores in the commercial art group. 6 
Although considerable sex difference exists, it will be seen from 
comparing raw scores and percentiles that these differences are 
markedly less than those given in the norms. For this reason 
it is probable that, until more complete norms are published, 
a comparison of raw scores would be simpler and more valid 
than conversion to norm-group percentiles when dealing with 
art students. In examining the performance of the art-teaching 
group it will be seen that this group scored significantly above 
the other groups in social service interest. In spite of this 
difference, the average score of the teaching group in this area 
is only 70.52 which is below the 50th percentile on the test 
norms, indicating that consideration of scores in all areas, re¬ 
gardless of percentile rank may be more useful in some cases 


5 Percentile Scores taken from profile sheet for the Kuder Preference Record, 



104 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 



^b\vo 

m On 

d vo 

O' kH 

0 vi 

t-> O 

VD ru 

c**; ci 

is- 

-Tf* kH 

•+ M 


«B 


d oo 

'cl d 

CO M 

T#- Vo 

Vo co 

co VO 

On vn 

6 

O' r- 

VO *-* 

r- *-< 

vo Hi 


* 


rn O' 
vo ON 

~r* m 

'toco 

Vo VO 

t'- o 

o-joc 

vooo 

VO OO 

<S 

d 

d 



•* 



d 

h~ 

Vo 

’co Os 
On O 

On d- 

w r- 

On Cl 

eo d 

OO »J-J 


vo w 

Vo M 

HI d 

r- t- 


HI *-H 

•+ o 

CO HI 

r~- os 

VO OV 

vo 6 

OQ 

oo 

CO HI 

» _ 



vn Q 

’co >-* 

O CO 

o o 

cl 

HI OO 

co so 

r^-so 

HI ci 

Vo M 

VO M 

VO M 

°?n m 

r> __ 

d 

co r^- 


C^, PH 

hi d 

vg co 

O' -+■ 

H 

^ HI 

M 

t-. M 

Cl VO 

■+ 


'«+ Cl 

WNOO 


ro On 

CS 


* 



n» 



M 

vn , 

VO t*"- 

r- >-* 

VO ^ 

O ^r 

On r- 

vovo 

ci r- 

VO M 

VO Hi 

vo W 

SQ 

SQ 

Sq 

C/1 

cn 

CO 

so 

VO 

S' 

r*- 

VO 

M 

kl 


cu 



3 



s 

cx 


a 

u 

3 

s 


5 

o 

& 


bo 

g 

3 

8 

■D 


C5 

aJ 

e 

B 

rd 

a> 

H 


o 


.5 

O 

^2 





THE INTERESTS OF ART STUDENTS 1 05 

than restricting attention to extreme scores. The male art¬ 
teaching students were considered separately and their average 
scores did not differ markedly from the entire teaching group in 
any of the nine Kuder areas, thus giving some justification for 
using the same raw-score norms for both sexes until more com¬ 
plete norms are established by further research. 

The fine arts group scored significantly above the other two 
groups in literary interest and also scored highest in music in¬ 
terest, being significantly above the commercial art group. Be¬ 
cause of the small size of the group no sex differences were com¬ 
puted. Some data which may be of help in evaluating the 
performance of art with respect to raw scores on the Kuder 
Preference Record may be found in Tables 2 and 3. 


TABLE 3 

Comparison between Commercial Art Group Raw Scores and Percentile Scores on the 

Kuder Art Scale 


Group 

Mec 

Com 

Sol 

Ter 

Art 

Lit 

Mu 3 

Soc 

CIo 

Men’s Raw 
Scores. 

73-55 

24.99 

51.73 

69,87 

87.15 

5 i 45 

3.1 .05 

56,04 45.53 

Women’s Raw 
Scores. 

59-95 

13-59 

48.52 

62.83 

87.22 

47.68 

19.41 

68.73 48.97 

Men's Percen¬ 
tiles . 

38 

16 

13 

46 

99 

63 

73 

H 

31 

Women’s Per¬ 
centiles. 

7 1 

13 

37 

54 

98 

41 

41 

22 

22 


Summary and Conclusions 

With regard to the questions stated in the opening paragraph, 
the following conclusions may be drawn: 

1. The group of art students in this study scored very high on 
the Kuder Art Scale, the men averaging 99th percentile and the 
women 98th percentile, thus differentiating them adequately 
from the general population. 

2. The correlation between art-course success and art-interest 
scores is not significant for the group studied. The homogeneity 
of the art students with respect to level of art interest in part 
accounts for this low correlation. 

3. A comparison of interest profiles for commercial art, art 
teaching, and fine arts students reveal that significant differ¬ 
ences do exist. The commercial art group is significantly supe- 










I 06 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


rior to both other groups in mechanical interest, exceeds the 
fine arts group in persuasive interest, and is significantly above 
the art-teaching group in scientific interest and clerical interest, 

The art-teaching group is significantly superior to the com¬ 
mercial art group in music interest and exceeds the fine arts 
group in persuasive interest, The chief characteristic of the art¬ 
teaching group, however, is its social service interest which is 
significantly above that of the other art groups. 

The fine arts group was high in literary interest and low in 
persuasive interest, being significantly different from the other 
groups in both, The fine arts group also exceeds the commercial 
art group in music interest. 

All three groups score highest in art interest, but are very 
similar, all averaging between 95th and 98th percentiles accord¬ 
ing to the Kuder norms. These findings agree quite closely with 
interest clusters suggested by Kuder in the test Manual Fur¬ 
ther study is necessary before the norms found in this investiga¬ 
tion can be regarded with complete confidence. 



A FACTORIAL INVESTIGATION OF FLEXIBILITY 1 

ROBERT W. KLEEMEIER 
and 

FRANK J. DUDEK 
Northwestern University 

In a previous investigation, performance on certain tests 
which were designed to measure flexibility seemed to be in¬ 
fluenced by the ingestion of Benzedrine sulfate to a greater 
extent than was performance on "non-flexibility” tests (3). 
This evidence was not strong but was, none the less, provoca¬ 
tive. The present study was designed to investigate more 
thoroughly the nature of flexibility by subjecting modifications 
of these tests to a more rigorous analysis. For the purpose of 
this study flexibility is defined simply as the ability (a) to 
shift from one task to another, or (b) to break through an es¬ 
tablished set in order to perform a task. We have preferred to 
use the term "flexibility” rather than the word “persevera¬ 
tion,” which has frequently been used to describe the abilities 
measured by tests of the general kind used here, because the 
latter term so often has associated with it specific theoretical 
connotations, e.g., Spearman’s mental inertia, Muller and 
Pilzecker’s usage as a memory phenomenon, etc. 

In an attempt to make the results as unambiguous as possible 
it was decided to investigate only one type of performance, 
viz., performance in which S would be required to shift tasks. 
Only simple tasks were used in the hope that factors would be 
more easily identified. Tests were designed to measure numeri¬ 
cal, perceptual speed, and verbal factors. Within each area the 
attempt was to make some of the tests factorially pure. One 
test in each area, however, was designed to measure flexibility 
by requiring S to shift from one simple task to another. It 
was anticipated that factors associated with number, perceptual 

1 This study was aided by a grant from the Committee on Research of the Graduate 
School of Northwestern University. 


107 



Io8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


speed, and verbal abilities could be isolated from this battery. 
The important consideration, however, was whether or not a 
factor which was common primarily to the tests requiring shifts 
of tasks would also emerge. If those tests which required shifts 
of tasks appeared on an independent axis, regardless of the 
type of ability represented, there would be evidence for a factor 
which might be called “flexibility" common to different types 
of tasks, 


Description oj ‘Tests 


Thirteen tests comprised the battery analyzed in this study. 
All tests were speed tests and, with the exception of the Same- 
Opposite Test, were administered in two parts so that estimates 
of test-retest reliabilities could be made. All tests were an¬ 
swered on separate IBM answer sheets. The various tests were: 

Single Digit Numbers Tests (SDN ).—Each of these tests con¬ 
sisted of 12,0 items administered in two parts of 60 items each, 
The time limit for each part was go seconds. <S"s task was to 
indicate whether answers of the problems as given were right 
or wrong, Sample items from each test are: 


I. Subtraction 

i. 8 — 3 = 6 

a- 7 - 4 = 3 
3. 3 - 1 = a 


i. Addition 

1-7 + 2 

а. 5 + 6 
3-8 + 2 

3. Mixed 

l-9~4 
2. 8 + 4 

3 - 3 + 6 

4- 3-2 

5 - 8 + 5 

б. 7 — 1 


9 

12 

11 


13 

12 

10 

1 

12 

6 


Tests were administered in the following order: Subtraction 
(Part I); Addition (Part I); Mixed (Part I); Mixed (Part II); 
Addition (Part II); Subtraction (Part II). 

Two Digit Numbers Tests (TDN ).—These three tests were 



FACTORIAL INVESTIGATION OF FLEXIBILITY 


IO 9 


the same as their counterparts in SDN tests, except that each 
0/ the numbers to be added or subtracted consisted of two 
digits, e.g., 11 +36 =47- In no case were the sums greater 
than two digits, although remainders were either one- or two- 
digit numbers. Two and one-half minutes were allowed for 
work on each part of the test. The tests were given in the fol¬ 
lowing order: Addition (I); Subtraction (I); Mixed (I); Mixed 
(II); Subtraction (II); Addition (II). 

Same-Opposite Test (SO ).—This test was comprised of 60 
of the more difficult pairs of words drawn from various forms 
of the Army Alpha Test 6 (7). d 1 indicated whether the words 
had the same or opposite meanings. The time limit for the 
test was two and one-half minutes. Sample items are: 

1. acme-climax 
1. ligature-band 
3. abstruse-recondite 

Word Completion Tests (JVC ).—Each of these tests consisted 
of 60 items. Each test was administered in two parts of 30 
items each with a time limit of 90 seconds for each part. S 's 
task was to select the one letter from among five alternatives 
which formed a word when used with a given stem of three 
letters. None of the stems were three-letter words in and of 
themselves. The tasks and sample items from each test are: 

1. Add final letter. (In this test A 1 had to select the letter which 
made a common four-letter word when added to the end of 
the stem.) 

1 a 3 4 5 

1. cen d c a t r 

2 . lam r b m f w 

2. Add initial letter. (In this test S had to select the letter which 
made a common four-letter word when added in front of the 
stem.) 

12 3 4 5 

1. oun s f h n b 

2. alf c u t a k 

3. Mixed. (In this test letters might go either in front or in back 
of the stem to form a word. No cues other than the stem 
itself were given as to whether the answer was an initial or 
final letter.) 

12 3 4 5 

1. pur q mj e t 

a. ite a h j by 



IIO EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The tests were administered in the following order; Final letter 
(I); Initial letter (I); Mixed (I); Mixed (II); Initial letter (II); 
Final letter (II). 

Perceptual Speed Pests {PS). —Each of these tests consisted 
of 60 items administered in two parts of 30 items each. The 
time limit for each part was two minutes. Each item consisted 
of a line of 30 capital block letters, d’s task was to count the 
number of times some particular letter appeared. This letter 
appeared from one to five times in each line. The number of 
times it occurred was also the answer for the item which was 
entered directly on the IBM answer sheet. Sample items from 
the various tests are: 

1. "N” test. (In this test the number of “N’s” occurring in a 
line of "M’s” was counted,) 

1. MMMMMMMMN MMMMMN M. .. 

2. MNMMMMMN MMMMMN MM... 

a. “W" test. (In this test the number of “W’s” occurring in a 
line of "M’s” was counted,) 

1. MMMMMMMMMWMMMMMMMM.,, 

a. MWMMMMMMWMMWMMMMWM. . . 

3. Mixed. (In this test each line consisted of "M’s”, "N’s”, 
and “W’s”. At the beginning of each line the letter to be 
counted was indicated in parentheses.) 

1. (W) MMMMNMMMWMMNWMNNM. 

2. (N) NMMMMWNMNWMMMMMMM... 

3. (N) MMMMMNNWMMMNMMWMW... 

The order in which the tests were administered was: “N’s" 
(I); u W's" (I); Mixed (I); Mixed (II); "^’s” (II); "M’s” (II). 

Population 

The test battery was administered to 205 college students. 
These were tested in groups ranging in size from eight to 48 
d’s. These tests were given to 104 if’s in the order in which 
they are described above, while 101 were given the tests in the 
reverse order. Since mean scores of groups showed no signifi¬ 
cant differences attributable to order, all data were combined 
into one group. 

Results 

Table 1 shows the zero-order intercorrelations of the tests in 
the battery. It will be noted that all correlations are positive, 



TABLE i 

Inttrconelations , Means, and S- D.’S of Flexibility Battery 
(jV = 105) 


FACTORIAL INVESTIGATION OF FLEXIBILITY 


1 I 1 


c?S3. 

*-* c ' fO 


M 3 d ■O 
M 3 d (O 


O>oo o S3 ^ 
cn o\ ca r2 
r-'o cm ro 


omoo cNd^r^ 

Odd h. co 

4 n o M m 


f'' o r~* hh O' ^ O- 
^ Vnoo CO NA1 Q H-d 
Th co ■**■ w-» 


CO C>H H H -4 toco 
O HI VicO 'O ^ CO 


-TfQ c-3 <rt 00 

I OS^\D ON MO »V,OC CO 

n co co ^ ^ co cn -4. 


On t~~v -*+■ o *-noo rl h h to 
-rhNO OO *r+- CM CM *-/-» •-< d 

M3MOM3 -tf- C*1 ^ ^ 


ci c-> on r^co 00 o c -5 t-j c-> 
»oro r^NO o O rOO>d O 
00 no no no M n n rr^ - ^ 


no co\o O co i>no m r*vo On 

CO On CO »-i o o r^clNO 

co r— ‘-ono mo hi n co n n ■+ •t 


-4T1 HTd rs r 


H-( Ci m Mono r-~oo on O hh d cn 






Ill EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

ranging in size from .908 (r 6 _«) to .072 (r T _ u ). Raw-score means 
and S.D.’s are also presented in this table. Examination of 
these means shows that alternation or mixed tasks in the 
number tests appear to be of the same order of difficulty as 
the straight addition and subtraction tasks. This is true for 
both the Single Digit and the Two Digit tests. This result 
would not have been anticipated from results of the Benze¬ 
drine study. In that study mixed addition and subtraction 
problems were markedly more difficult than either the addition 
or the subtraction tests alone. Tests used in the earlier study 
were just like the Single Digit Numbers tests except that £ 


TABLE 1 

Factor Loadings and Communalihes of Variables in Flexibility Battery 


Teat 

Centroid Loadings 

I II III 

IV 


K 

1 

Rotated Loadings 

II III IV 

*; 

I. 

717 

-306 

34a 

— 226 

791 

80 

334 

140 

799 

156 

796 

2. 

777 

— 291 

350 

-259 

878 

87 

337 

I96 

8.16 

160 

876 

3- 

8oi 

—248 

297 

-213 

837 

84 

368 

236 

780 

392 

836 

A 

847 

222 

272. 

236 

896 

90 

287 

417 

404 

691 

897 

5- 

854 

CL05 

292 

138 

913 

91 

193 

4°3 

426 

697 

916 

6. 

864 

198 

264 

229 

908 

91 

314 

4i7 

421 

678 

909 

7' 

3 fi 7 

3°9 

-094 

-086 

246 

3° 

°S7 

464 

064 

H3 

243 

8. 

700 

388 

-277 

-223 

767 

is 

213 

813 

15a 

109 

763 

9. 

694 

427 

-203 

— 204 

747 

74 

198 

803 

174 

169 

743 

IO. 

6O4 

3S4 

— 262 

— 211 

003 

6l 

114 

73i 

120 

077 

600 

II. 


-429 

— 298 

228 

628 

64 

776 

02Q 

129 

081 

626 

12. 

636 

-423 

-376 

233 

779 

78 

863 

roj 

lib 

081 

776 

13- 

713 

—408 

-313 

247 

840 

83 

876 

13a 

J 77 

150 

839 


wrote his answer beside the problem instead of indicating on a 
separate sheet whether or not the given answer was correct. 
Apparently it is just this difference that accounts for the diver¬ 
gence in results obtained here. In both the Word Completion 
tests and the Perceptual Speed tests the mixed sections yield 
significantly lower scores than do the other sections. 

Table a contains the centroid and the rotated factor loadings. 
Four factors were extracted from the intercorrelation matrix. 
No significant residuals remained in the fourth-factor residuals. 
As a matter of fact, the fourth factor itself contributes little 
to any correlation. Rotations of these factors were made to 
satisfy, insofar as possible, criteria of simple structure and posi¬ 
tive manifold. It is apparent that those tests which had been 





FACTORIAL INVESTIGATION OF FLEXIBILITY II3 

designed to measure flexibility (tests 3, 6, 10, and 13) do not 
group themselves along an independent axis, but rather can 
be accounted for in terms of number, perceptual, and verbal 
factors depending upon the type of test. The factors are rela¬ 
tively easy to identify according to the nature of the task. 

Factor I (Perceptual Speed-P ).—The highest loadings in this 
factor are found in the PS tests (tests 11, 12, and 13). It is not 
surprising that SD Attests (tests i, 1, and 3) show some satura¬ 
tion in this factor. The simplicity of the problems, for college 
groups at least, is such that perceptual speed might well in¬ 
fluence the speed with which correct and incorrect answers 
are recognized. 

Factor II (Verbal-V ).—Tests 7, 8, 9, and 10 show the highest 
loadings in this factor. Test 7, Same-Opposite, shows less load¬ 
ing in this factor than might be expected, but, none the less, 
its relationship to the word completion tests is unmistakable. 
The TDN tests show minor loadings in this factor. Some cor¬ 
relation between verbal and numerical factors has often been 
observed. This relationship may depend on the complexity of 
the numerical task, since the SDN tests show practically no 
loading on this factor. 

Factor III (Single Digit Number [d’ZW]).—Clearly, tests 
1, 2, and 3 have the highest loading in this factor. Considering 
the apparent similarity between the SDN tests and the TDN 
tests one might have expected the latter to exhibit higher 
loadings in this factor. The TDN tests, however, came out on a 
factor of their own. 

Factor IV (Two Digit Number [TDN ]).—This factor shows 
the highest loadings in the TDN tests. Relatively little of the 
variance of other tests in the complete battery can be ac¬ 
counted for on the basis of this factor. 

Table 3 shows the per cent of the total variance of each test 
attributable to each factor. Thus, 64 per cent of the variance 
of test 1 (SDN, subtraction) can be accounted for by the SDN 
factor, 11 per cent by the factor P and only 4 or 5 per cent by 
both factors V and TDN combined. The sixth column (hf) 
shows the communalities computed from the rotated factor 
loadings for each test, or the per cent of variance in each test 
accounted for by the four common factors. The specificity 



II4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(Sp) of each test (column seven) is the difference between the 
reliability of the test (r n , column eight) and A 2 . (Since A 2 can¬ 
not be greater than r u, when this occurs it is apparently due to 
slight errors in estimating one or the other value.) It will be 
noted that Sp is close to zero for all tests with the exception of 
the Same-Opposite test. Since the Same-Opposite test was a 
speed test and since it was given in only one part, no reliability 
was computed. However, it was assumed that the reliability 
might be near .80. If this estimate is not seriously in error, 


TABLE 3 

Factor Variance AccountedJor by Factors Isolated in Flexibility Battery 


Teat 

I 

II 

III 

IV 

K 

Sp 

rll 

£. 

I, 

1116 

02.22 

6384 

0243 

80 

OO 

80 

20 

2. 

1136 

0384 

6989 

02 56 

88 

OO 

87 

13 

3 ■ 

I 3 S 4 

0557 

6084 

0369 

84 

07 

91 

09 

4 ■ 

0824 

1739 

1632 

4775 

90 

OO 

89 

n 

5 . 

0858 

1624 

1815 

4858 

9a 

OO 

92 

08 

6. 

0986 

1739 

1772 

4597 

9 i 

02 

93 

°7 

7 - 

003a 

2153 

004.1 

0204 

2 4 

56 

80* 

20* 

a. 

0640 

664a 

0231 

0119 

76 

02 

78 

22 

9 - 

0392 

6448 

0303 

0286 

74 

OO 

73 

27 

IO. 

° 4 S® 

5344 

oi 44 

0059 

60 

02 

62 

38 

II. 

602a 

0008 

0166 

0066 

63 

16 

79 

21 

ia. 

744E 

01 Io 

013S 

0066 

7 B 

06 

84 

16 

IS' 

7674 

0174 

0313 

0225 

84 

OO 

76 

H 


* Estimated. 


then the test shows a high degree of specificity. Apparently the 
verbal factor isolated here is not the only one necessary to 
explain scores on a rather difficult word meaning test. 

Error variance (E v ) is shown in the last column of Table 3. 
It is the difference between 1.00 (assumed total variance of 
each test) and the reliability coefficient (non-error variance). 

Discussion 

The analysis of the data presented above suggests strongly 
that no factor of flexibility need be postulated to account for 
differences in performance on these simple alternation tasks. 
Since Spearman spoke so enthusiastically about the factor of 
perseveration, relatively little evidence has been put forth to 
substantiate the hypothesis (5), Most previous investigators 
have attempted to investigate the problem by using batteries 





FACTORIAL INVESTIGATION OF FLEXIBILITY 11 5 

of tests which had been designed to measure anything that 
might possibly be considered under the term perseveration. 
Notcutt, for example, used a battery of 15 tests designed to 
measure sensory perseveration, motor perseveration (both crea¬ 
tive effort and alternation type), and associative perseveration 
(4). Cattell likewise has used batteries of fairly complex tests; 
indeed, Notcutt borrowed extensively from Cattell’s tests (1), 
It would seem that the aim of these investigators differed 
from ours. They were attempting to identify a general per¬ 
severation factor which would influence the total behavior of 
the person. Their postulate, based on Spearman's "Law of 
Inertia,” led them to hope that this factor would pervade all 
sensory and motor activities. It should be found in learning 
and would be an important factor of temperament and as such 
should influence feeling, attitude, apperception, and even the 
"natural rhythm” of the individual. Experimental results do 
not support such a general factor (2). 

In this study we have steered clear of perseveration conceived 
of as an all-pervasive factor. This concept has been avoided 
by the adoption of the term flexibility , and with the use of 
simple tests scox-ed in a simple way. At this level of simplicity 
it is quite evident that a general factor of flexibility does not 
exist. Perhaps it may be demonstrated if a more complicated 
series of shifts between equally well-established habit patterns 
were to be required of S. Thus within a single factor area, say 
Number, S could be tested in addition, subtraction, multi¬ 
plication, division, and various combinations of these functions . 
To this could be added various ways in which the problems 
could be presented. Flexibility would be demonstrated if the 
mixed tests should group themselves along an independent 
axis. 

Notcutt presents results which appear at first glance to be 
at variance with our findings (2), He states that in his battery 
of tests the alternation tasks "reveal a genuine though small 
factor.” His method of analysis involved the averaging of the 
intercorrelations of five alternation tasks. The average correla¬ 
tion thus obtained was 0.181 =b 0.030. The tasks required in 
these tests were such things as writing H’s then ffi’s, and writ¬ 
ing ABCD, then abed. The method used in this analysis seems 



1 X 6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

somewhat tenuous for this result to be accepted with confi 
dence, It would be interesting to determine what factors woulc 
emerge from an analysis of his intercorrelation matrix. 

Scoring .—Earlier writers have been greatly concerned about 
the problem of scoring alternation tests. Since they were in¬ 
terested in the hindering or facilitating effects of the alterna¬ 
tion task as compared to the homogeneous task, scores were 
sought to express this relationship, Cattell criticizes the use of 
the simple difference score (X — T) by pointing out that this 
score will be highly correlated with speed (i). Thus if X = 
the score on the critical task and T = the score on the homo¬ 
geneous task, a slow worker would get a smaller flexibility score 
than a fast worker even though both experienced an equal 
amount of interference. Therefore, he used the ratio X/T in 
scoring his tests. This scoring is satisfactory as long as one is 
working with tests requiring <? to overcome habitual sets such 
as CattelPs triangle, reversed letter, and cancellation tests. 
Thus, on his reversed letter test T would equal the score on 
writing the letters opqrst and X would equal the score on 
writing these letters in the reversed order tsrqpo. Walker, 
Staines, and Kenna, however, point out that this method can¬ 
not be used for alternation tests such as those in the present 
study (6). To do so one would let T — the combined score on 
the homogeneous tasks, e.g., Addition plus Subtraction, and 
X = the score on the mixed task, The scoring formula X/T 
under these conditions can give an accurate picture of an inter¬ 
ference effect only if the speed of work on the two homoge¬ 
neous tasks is equal. If one of the tasks is inherently more diffi¬ 
cult than the other, this method of scoring will show an 
artifactual interference effect even though none may exist. 2 
They suggest, therefore, that the 

Interference Score = E/A 

where £ = expected score on the alternation task, 
and A = actual score on the alternation task. 


2 As the authors point out, if S could do 6o addition problems in one minute and 
only 30 subtraction problems in one minute, his rate for addition would be one prob¬ 
lem a second, and for subtraction one problem every two seconds. If were doing both 
addition and subtraction (mixed) in one minute he should do 40 problems, and in two 
minutes Bo problems. On the other hand, if he spends one minute doing addition 
problems and one minute doing subtraction problems, he will finish a total of 90 prob¬ 
lems. Here the scoring formula X/Y = 80/90 = .89. The interference shown is an 
artifact. 



FACTORIAL INVESTIGATION OF FLEXIBILITY 


II? 


Thus if 60 seconds is devoted to each task, 

F 60 
T,+ 7 V 

Ti = the time to do each unit of the first task and is given by 
the formula 


Tx = 
and T 2 = 


60 

score on the first activity 

_60_ 

score on the second activity. 


While the logic of these scoring methods is sound, it does not 
seem necessary to resort to them in correlational studies. In 
fact their use makes it very difficult to determine the relation¬ 
ship between the flexibility and the nonflexibility tasks. Thus, 
while the use of such scores makes it possible to determine 
whether or not interference exists, it is impossible to determine 
from them whether or not the interference is uniformly ex¬ 
perienced by all S’s (high positive correlation) or is a factor 
unrelated to performance on the nonflexibility tasks. On the 
other hand, should the E/A ratio fail to give indication of in¬ 
terference, it still does not seem admissible to conclude that 
interference was not a factor. Thus, using this ratio on the ob¬ 
tained mean scores on theiSTW tests (Table 1), an interference 
score (E/A) of .53 is obtained and on the WC test we get an 
interference score of .92. In both tests, of course, if the mixed 
tasks and the single tasks were both equally difficult, E/A 
should equal .50. At first glance, therefore, it would seem that 
flexibility could be a factor on the WC tests but not on the SDN 
tests. This is an erroneous impression, for in spite of the high 
average score on the SDN mixed test, it is necessary to know¬ 
how the mixed test correlates with both the Addition and 
the Subtraction tests. If these correlations were appreciably 
lower than the correlations between the Addition and the Sub¬ 
traction tests, it would indicate that some factor other than 
ability to add and subtract entered into the mixed task to 
lower the correlations. These correlations are, however, essen¬ 
tially equal. The same may be said for the JVC tests. In this 
case since ra-s, r 8 _ M , and r 9 _ 10 are all about equal in magnitude, 
indications are that a factor of flexibility need not be postulated 
to account for the relatively slow performance on the Mixed 
tests. Our factor analysis, of course, confirms this throughout 
the battery. Thus, it is felt that in factorial investigations 



11 8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

scoring formulae such as those described above are not only 
unnecessary but that actually these procedures may obscure 
data which can give valuable information. 

Summary and Conclusion 

The purpose of this study was to investigate the nature of 
flexibility by factorial methods. A battery of 13 tests was con¬ 
structed. These tests were designed to measure numerical, per¬ 
ceptual speed, and verbal factors. Within each area the attempt 
was to make some of the tests univocal (factorially pure). 
One test of each type, however, was designed to measure flexi¬ 
bility by requiring £ to shift from one simple task to another. 
The tests were designed for machine-scoring and were speed 
tests. J’s were 205 college students. Test scores were inter- 
correlated and the matrix of intercorrelations was factorially 
analyzed. Four factors were extracted. These were identified 
as: (P) Perception, (V) Verbal, (SDN) Single Digit Number, 
and (TDN) Two Digit Number. Those tests which required 
a shifting of tasks could be accounted for on the basis of the 
above four factors; consequently the postulated factor of flexi¬ 
bility common to the different types of tasks was not necessary 
to account for the obtained results. 

In reference to scoring, the position is maintained that the 
various difference scores and ratio-scoring techniques used by 
other investigators are not necessary in factorial investigations 
of flexibility and indeed may obscure the essential relationship 
between the flexibility and nonflexibility performance, 

REFERENCES 

1. Cattell, R. B. "Temperament Tests. II. Tests.” British Journal of 

Psychology, XXIV (1933), 20-49, 

2. Cattell, R. B. Description and Measurement of Personality. Yonkers- 

on-EIudson: World Book Co,, 1946. 

3. Kleemeier, L. B. and Kleemeier, R. W. “Effects of Benzedrine Sul¬ 

fate (Amphetamine) on Psychomotor Performance.” Ameri¬ 
can Journal of Psychology, LX (1947), 89-100. 

4. Notcutt, B. "Perseveration and Fluency.” British Journal of Psy¬ 

chology, XXXIII (1943), 200-208. 

5. Spearman, C. The Abilities of Man. New York: Macmillan Co., 

1927. 

6. Walker, K, F,, Staines, R, G. and Kenna, J. C. “P-Tests and the 

Concept of Mental Inertia.” Character and Personality , XII 
_ (1943), 32-42. 

7. Yerkes, R. M. (ed.) Memoirs of the National Academy of Sciences, 

XV (1921), 207-230. 



THE STANDARDIZATION OF TEIE MOORE EYE-HAND 
COORDINATION AND COLOR MATCHING TEST 1 

JOSEPH E. MOORE 
Georgia Institute of Technology 

The Moore Eye-Hand Coordination and Color-Matching Test 
was originally developed to measure the speed of eye-hand 
coordination of small children, and it was found in subsequent 
studies to differentiate clearly the same factor in adults, It 
was thought that if a test of eye-hand coordination could be 
devised which would stimulate immediate interest, it would 
prove valuable in measuring certain differences in young chil¬ 
dren in whom this type of learning has not occurred to any 
great extent, or has not become highly specialized, 

In order to devise a test which would appeal strongly to 
young children it was decided to utilize their interest in mar¬ 
bles. The first test that was constructed was a bulky affair 
and difficult to manipulate. By a process of trial and revision 
the instrument has been markedly changed and, it is to be 
hoped, improved. The pre-school and the adult tests areidentical 
except in length. The pre-school, or short form, has been used 
to test both white and Negro children as young as two years of 
age, Motivation is rather easy, since children generally take a 
keen delight in picking up the marbles and putting them in the 
holes, 


1 This project was made possible in part through a grant-in-aid allocated by a Re¬ 
search Committee at the Georgia Institute of Technology from hinds made available 
jointly by the Carnegie Foundation and Georgia Institute of Technology. The author, 
however, and not Georgia Institute of Technology, is solely responsible far statements 
made in this report, 

The writer wishes to acknowledge the assistance and cooperation of.the following 
individuals: President Robert P. Daniel and Mr. William N. Smith, Personnel Coun¬ 
selor, Shaw University; Prof, Dorinda Duncan, Tuskegee Institute; Dr. Susan Gray, 
George Peabody College; Dr. Sidney Q. Janus and Prof, Albert S. Glickman, Georgia 
Institute of Technology; Prof, Herman Long, Fisk University; Dr. C, W. 'Ihomasson, 
Drexel Institute; Dr, R, R. Ullman, Wittenberg College; and Prof. Joseph L. Whiting, 
Atlanta University. 



120 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The following picture shows the adult or long form of the test. 



Fig. I. Eye-Hand Coordination and Color Matching Test 

It will be seen that the test is a rectangular board l6-| x 
19I inches. The thickness of the test is slightly over f inch. 
There are four rows of one-half inch holes. Each row contains 
eight holes spaced l-| inches apart. There are four starting 
boxes or slots holding the marbles, one at the end of each row 
of holes. Each box holds eight marbles for die Speed Test and 
twelve marbles for the color-matching part of the test. 

The Color-Matching Test operates in the following way: 
Under each hole there is a colored piece of paper covered by 
transparent tape. The colors, in order, are red, green, blue, and 
yellow for the first row of eight holes. The second row is green, 
blue, yellow, red, the color sequence being different for each 
row. 

The Pre-School Test is actually half as long (16 marbles in 
each trial are used instead of 32) as the adult form. The child 
is seated comfortably and is told to watch as the examiner shows 
him how to play the game, The examiner then takes one mar¬ 
ble at a time and puts it in the hole so as to give the impression 
that it is fun to play the “game” fast. The child is then per¬ 
mitted to take a practice trial on the first eight marbles. The 



TEST STANDARDIZATION 


iai 


test score is the total number of seconds it takes a child to do 
the 16-hole test three times, placing the marbles in consecutive 
order. A Pre-School Test can be made by covering one-half 
of the Adult Test with a piece of cardboard. 

The norms for the Pre-School Test were based on the scores 
of children from nursery schools, kindergartens, and lower¬ 
elementary schools from the states of Tennessee and Kentucky. 

From Table i it is seen that the average time for each age 
group becomes progressively faster. Comparison of the means 
with medians shows that every group except one is negatively 
skewed. 2 The range in scores indicates the extremes that are 
found in the reaction time of children within the age range 
studied. The standard deviation tends to become progressively 


TABLE i 

Speed Measured in Seconds, oj Eye-Hand Coordination of 431 Children on the 
Pre-School Form 




. 

Age in Months 







24-29 

30-35 

36-41 

42-47 

48-53 

54-59 

60-65 

66-71 

72-77 

78-83 




Number of Children 







10 

25 

45 

56 

47 

54 

49 

38 

78 

29 

Range of Speed 

141-448 

122-372 

88-275 

90-210 

81-205 

76-222 

75-191 

66-146 

60-110 65-95 

Median 

215 

177.5 

142 

135 

120.9 

106.3 

100 

88 

83.4 

79.1 

Mean 

225 

177.5 

156.7 

137.6 

123.3 

112 

108 

93.5 

84.6 

80.9 

Standard Deviation 

94.7 

58.2 

48.7 

26.7 

29,4 

27.0 

26.2 

22.5 

10.7 

7.6 


smaller for each succeeding higher age group represented in 
the sample. 

The Long Form or Lest for Adults 

The long form of the test requires placing 32 marbles, one at 
a time, in consecutive order in the holes. The test is taken 
in a seated position and has been standardized at typing-table 
height, or approximately 26 inches. The subject first has a 
practice trial of a row of eight marbles. The individual’s score 
is the total number of seconds it takes him to complete three 
runs of 32 marbles each. 

The long form of the test has been employed to measure the 
speed of eye-hand coordination of children in both elementary 
and high schools. The data on the performance of individuals 

1 Scores represent the number of seconds necessary to do the test. The fewer the 
seconds the faster the performance. The distribution therefore represents slow scores 
at the left and fast scores at the right. 






122 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

between the ages 0/ six years and sixteen years are presented 
in Table 2. 

Each age group represented in the above sample (Table 2) 
completes the test progressively faster than the next younger 
age group. If the small sub-group samples are representative 
of the corresponding larger populations it would seem that the 
smallest changes in speed and precision occur in the younger 
age groups, ages six through ten, and the greatest between the 
ages of eleven to sixteen. It will be noted that four of the age 
groups are positively skewed, the median being larger than the 
mean, The age groups which arepositivelyskewedaresix, seven, 
thirteen, and sixteen. The small sampling could account for a 
part or all of the skewness. 

TABLE a* 

Spied, in Seconds, qf Eye-Hand Coordination jar 602 Subjects Aged Six 
Through Sixteen Years 


Age 



6 

7 

B 

9 

10 

11 

12 

13 

14 

15 

16 





Number of Subjects 







33 

SB 

66 

42 

18 

38 

45 

43 

28 


207 

Range 

121-200 

110-195 

105-175 

93-160 

101-142 

100-135 

81-150 

84-130 

83-141 

U-133 

75-125 

Median 

166.3 

145.0 

133.0 

130.0 

120.0 

116,5 

115.9 

110.8 

107.3 

103,8 

95.4 

Mean 

162.6 

144,5 

135.0 

132.9 

123,3 

118,1 

116.1 

109.7 

108.9 

104.5 

93.3 

S,D. 

19.7 

15.5 

15.2 

14.9 

12.1 

7.7 

12.1 

9.2 

12.9 

11.7 

11.0 


* Detailed norms are given in the Manuals, 

Data available on the long form of the test indicate that it 
can be considered reasonably well standardized, at least on 
Southern men and women. The data from two Northern schools, 
Drexel Institute and Wittenberg College, are so similar that it 
does not appear that any great divergence of central tendencies 
and variability are to be expected in other areas. Further stud¬ 
ies are encouraged, however, to prove the accuracy of this 
assumption. 

The data that have been accumulated on the long form of the 
Moore Eye-Hand Coordination and Color-Matching Test are 
presented in detail for adult subjects in Table 3. Separate 
norms have been presented for white and Negro subjects. The 
justification for the separate norms for whites and Negroes was 
the fact that the difference between the average time of the 
two groups favored the whites on both the speed and the color- 




TEST STANDARDIZATION 


123 


matching tests. The difference in performance of the whites 
was statistically significant at the 1 per cent level for all com¬ 
parison except that between white women and Negro women 
on speed, in which instance the difference was not statistically 
significant. 

Table 3 reveals that on the speed of eye-hand reaction, 
women are faster than men. White men did the test more 
rapidly than Negro men and white women did the test more 
rapidly than Negro women. These differences favoring the 


TABLE 3 

Norms for Adults on the Long Form oj the Moore Speed oj Eye-Hand Coordination Test 



College 

While Men 
Non-College 

Bus!, & Ind, 

Negro Men 

College Non-College 

Number of Subjects 

776 

1,707 

1,222 

451 

108 

Range 

73 - 1*3 

70-180 

75-175 

80-150 

85-180 

Median 

96.18 

104.30 

IO4.O4 

IOO.92 

109.0 

Mean 

96.72 

106.00 

I03.26 

99.20 

iii.o 

S.D. 

8.85 

ia.oo 

IO.48 

9.17 

H -5 



White 

Women 

Negro Women 



College 

Bull. & Ind. 


College 

Number of Subjects 


3*4 

348 


2B0 

Ran^e 


74-144 

74-131 


64-136 

Median 


94.27 

98-5 


95.61 

Mean 


95-59 

99.0 


96,25 

S.D. 


9 55 

9.25 


10*15 


white men are statistically significant. The difference between 
the mean of college women favored the faster performance of 
the white women but the difference is not statistically signifi¬ 
cant. Negro women performed the test somewhat more rapidly 
than did men in the college groups. The greatest differences in 
speed of eye-hand coordination are between college and non¬ 
college groups rather than between racial groups. 

The non-college white males were men who came through 
the Georgia Tech Guidance Center and were being considered 
for work calling for some type of manipulative skill, The scores 
of these men were negatively skewed. In short, these men did 
fairly well in performance calling for quick and accurate manip¬ 
ulation insofar as such factors were measured by the Moore 
test. It will be seen that Negro college men also worked much 
faster than the non-college Negro group. 



1^4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The Color-Matching Test 

The Color-Matching Test has been developed during the 
last eight years at the request of certain industrial firms, The 
test requires that the individual match a marble of a specific 
color with a hole of the same color. The colors, as were men¬ 
tioned previously, are arranged in an irregular order. The four 
colors used are red, green, blue, and yellow. The score on the 
color-matching part of the test is the number of seconds re¬ 
quired to complete the test; that is, to match the 32 marbles 
with the 32 colored holes three times. If a mistake is made, 
such as placing a red marble in a yellow hole, one second is 


TABLE 4 

Norms for Adults on Speed of Color-Mulching 



College 

White Men 

Non-College Busi. & Ind. 

Negro Men 

College Non-College 

Number of Subjects 

368 

1,181 

701 

451 

81 

Range 

87-170 

90-349 

90-304 

91-258 

100-280 

Median 

114 . 5 a 

132.9 

132.02 

123,7a 

152.0 

Mean 

115.54 

137 -1 

133.50 

125.01 

158.0 

S. D. 

11.00 

24.8 

18.42 

15-35 

36-0 



White Women 

Negro Women 



College 


College 

Number of Subjects 


322 


280 


Ranjje 


80-214 


84-190 

Median 


110,00 


118.20 

Mean 


110.02 


121.33 

S. D. 


II .70 


16.20 


added to his score for each such error. The subject is not per¬ 
mitted to arrange the marbles in a definite order previous to 
the starting signal. 

Table 4 presents the data on the color-matching test with 
separate norms for white and Negro groups. The white college 
groups did the test more rapidly than the Negro college groups. 
The differences between the respective means were statistically 
significant at the 1 per cent level. 

As would be expected, color-matching takes longer than the 
simple Speed Test. It takes an individual between five and ten 
seconds longer per trial to match the colors, or from fifteen to 
thirty seconds longer for the three trials. A comparison of 
Tables 3 and 4 reveals that the differences among the various 
groups are more pronounced on the Color-Matching Test than 





TEST STANDARDIZATION 125 

on the simple Speed Test; especially is this the case in the com¬ 
parison of college and non-college white men. 

Validity 

The validity of the Moore Eye-Hand Coordination Test was 
investigated in a number of ways. Age differentiation was one 
criterion. The test differentiated between the various age groups 
from 24 months to sixteen years and older. As each group of 
subjects took the test those who were older tended to make 
faster scores. After the sixteenth year age did not seem to have 
any appreciable relation to speed on the groups included in 
this study. 

In the business and industrial field two studies are available 
on validity. In one study ten ice cream sandwich makers took 
the Speed Test and the scores were correlated with the number 
of dozens of sandwiches each turned out in a specified time. The 
coefficient of correlation was .52, The second study dealt with 
23 loom operators or weavers. The group was divided into 
those above and below average on the speed of color-matching 
and above and below $1.18 in hourly earning rate. A tetra- 
choric correlation of .86 was obtained. 

The correlations of the Moore tests with other dexterity 
tests were also used as indirect evidences of validity. The speed 
of eye-hand coordination correlated .51 with the Pennsylvania 
Bi-Manual Work Sample (assembly) on 317 adult male sub¬ 
jects. On the Minnesota Rate of Manipulation Test the coeffi¬ 
cient for placing was .67 for 157 subjects, and for turning it 
was .45 for 191 cases. The color-matching part of the Moore 
Test gave the following correlation coefficients with other tests: 
O'Connor Tweezer, .54 for 133 subjects; Minnesota Rate of 
Manipulation (placing), .53 for 103 men. The Pennsylvania 
Bi-Manual Work Sample (assembly) correlated .51 for 237 in¬ 
dividuals and for disassembly, ,50. 

Reliability 

The reliability of each of the three trials of the test was also 
studied on 441 men. The scores for trial one (32 marbles) were 
correlated with the scores for the second trial (32 marbles), and 
a coefficient of .83 was found. Scores for trial two were correla- 



126 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ted against scores for trial three and a coefficient of .77 was 
obtained. Scores for trial one were then correlated with scores 
for trial three and a coefficient of .82 was revealed. From these 
data it would appear that the instrument is doing a fairly con- 
sistent job of testing, even if the obtained speed on the first 32 
marbles is taken as a criterion of actual speed. 

The reliability of the color-matching part of the test was com¬ 
puted on the scores of 83 college juniors and seniors by the 
test-retest method. Total scores (96 marbles) obtained one 
week apart were found to give a coefficient of correlation of 
.82. When this is corrected for the restricted range, the coeffi¬ 
cient becomes .955. 

The Moore Speed of Eye-Hand Coordination and Color-Match¬ 
ing ‘Test yielded a correlation coefficient of .67 on a group of 
364 adults. It would appear that the speed element is playing 
a major part in both tests. 

The reliability of the Pre-School Form was checked on 81 
children drawn from the pre-school group and the first two 
grades of elementary school. The test-retest method after a 
period of one week gave a coefficient of reliability of .95. 

Summary 

1. The pre-school Form of the Moore Speed of Eye-Hand 
Coordination 'Test differentiates the performance of children 
between ages of 24 and 72 months. The reliability of the Pre- 
School Test for 81 children by the test-retest method after one 
week was .95. 

2. The long form of the test is able to differentiate between 
each age group for ages six through fifteen years. After the six¬ 
teenth year speed does not appear to be very closely related to 
age for the groups included in this study. 

3. The validity of the Speed Test has been investigated by 
using such criteria as age differentiation, correlation of the 
speed and production of a group of ice cream sandwich makers 
(.52), and correlation with the Minnesota Rate of Manipulation 
(placing) for 157 subjects (.67). On the Color-Matching Test 
a group of 23 weavers was divided into above- and below-aver- 
age groups on the color-matching test scores and above- and 
below-average groups on hourly earning rate, and yielded a 
tetrachoric correlation coefficient of .86. 



TEST STANDARDIZATION 


10,7 


4. The reliability of the Moore Test was determined by the 
test-retest method after a lapse of one week. The correlation 
coefficients ranged from .95 to .72. 

^ Coefficients of correlation between each of the three sep¬ 
arate trials were obtained on a group of 441 men and used as 
partial measures of reliability. Trial one correlated with trial 
two showed a coefficient of .83; trial two with trial three, .77; 
and trial one with trial three, .82. 

The Moon Eye-Hand Coordination and Color-Matching Test is pro¬ 
duced and distributed by The California Test Bureau. 



AN INVESTIGATION OF A COUNSELOR ATTITUDE 
QUESTIONNAIRE 1 


william a. McClelland 

Brown University 
and 

H. WALLACE SINAIKO 
New York University 

Introduction 

Attitudes held by a counselor toward his own behavior in 
counseling situations, and toward various couseling techniques, 
can have marked implications for effective counseling. How¬ 
ever, definitive studies of counselor behavior are practically 
non-existent. It is the purpose of this study to determine the 
effectiveness of one technique in the quantitative measurement 
of these attitudes. Several applications of this technique, the 
questionnaire method, have been attempted and will be dis¬ 
cussed. 

Implicit in the questionnaire approach to the investigation 
of counselor attitudes is the assumption that “correct” re¬ 
sponses can be determined. In an investigation of this problem 
Chase 2 had 34 judges, “selected because of their known under¬ 
standing of and ability in counseling,” respond to a 101-item 
Questionnaire he had constructed. Typical items from the 
Chase Questionnaire are as follows: 

12345 Permitting the counselee to express himself freely. 

12345 Reprimanding the counselee for displaying aggres¬ 
sion. 

12345 Advising the counselee to stay on the safe side and 
not take chances. 

The five numbers before each item represent the counselors’ 
attitude toward the practice as follows: I, Decidedly harmful; 

l This paper was presented at the Midwestern Psychological Association meeting 
May 8, 1948. 

1 Chase, Wilton P. “Measurement of Attitudes Toward Counseling.” Educational 
and Psychological Measurement, VI (1946), 467-473, 


128 



COUNSELOR ATTITUDE QUESTIONNAIRE 1 29 

2, Probably harmful; 3, Doubtful; 4, Probably good; 5, De¬ 
cidedly good. A scoring key was developed by counting as 
"correct” all single ratings of items that received a clear major¬ 
ity of the judges’ responses. If two adjacent ratings of an item 
received a clear majority, both were scored as “correct” re¬ 
sponses. Chase was then able to give his Questionnaire to 
counselor trainees, to compare their responses with those of 
the 34 judges, and to derive quantitative "scores” (or indices 
of agreement) with the judges. 

The above method of determining "correct” responses in an 
attitude questionnaire is subject to qualification. First, keying 
items is undoubtedly some function of the judges’ training and 
experience, their temperaments, and their philosophies of coun¬ 
seling. Thus, it is reasonable to expect considerable variation 
in keys derived from different groups of judges. Second, the 
original “set” given the judges might well influence their re¬ 
sponses. Chase instructed the judges to “Keep in mind that in 
every one of the items a general situation is described, and one 
is therefore not to think in terms of individual cases.” But 
counseling is not in terms of a general situation, It is con¬ 
ceivable that the same counselor dealing with a very dependent 
student would behave in one way, yet in counseling a how-to- 
study case might perform quite differently. In short, does the 
"general counseling situation” exist at all? If the same coun¬ 
selor can have different sets of attitudes for different counseling 
contacts, then there might be need for as many “correct” 
questionnaire responses as there are diagnostic categories and 
counseling philosophies. 

For purposes of the present study it was assumed that judges 
are capable of making responses to the Chase Questionnaire 
items in terms of a general counseling situation. If a meaning¬ 
ful key could be derived, the use of a questionnaire of this 
type might be of considerable value in the selection and train¬ 
ing of counselors. In an agency where counselors specialize in 
one or more problem areas, inspection of the individual re¬ 
sponses and total score would be of value in the placement of 
counselors. In a counselor-training situation early identification 
of attitudes at variance with local, empirically defined, “cor¬ 
rect” attitudes might facilitate the orientation of the training. 
These are two possible uses of such an instrument. 



130 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Method and Subject 

A list of expert counselors, each of whom was to re-evaluate 
the Chase Questionnaire items, was compiled. The group in¬ 
cluded only persons who had had at least ten years’ counseling 
experience in and around the University of Minnesota, or who 
had obtained the Ph.D. degree in Personnel Psychology at that 
institution. The thirteen expert counselors selected may be 
characterized as homogeneous by training and counseling ex¬ 
perience. Four of the group were academic instructors in coun¬ 
seling courses, four were full-time counselors, and five divided 
their time between personnel administration and counseling. 
The judges were given a shortened form of the Chase Question¬ 
naire: ten of the original 74 scorable items were eliminated be¬ 
cause they were specific to military separation counseling. 

A “Minnesota key” for scoring the questionnaires was ob¬ 
tained as follows: The mean and standard deviation was com¬ 
puted for each of the 64 items. Responses were weighted on 
the five-point scale described above. Those items were elim¬ 
inated which had a standard deviation of .8 or larger (arbi¬ 
trarily selected since these items could have more than two 
“correct” responses). Inspection of the distributions of judges’ 
ratings supplemented this application of summary statistics. 
Forty items remained on which there seemed sufficient agree¬ 
ment between the judges so that either one single or two ad¬ 
jacent ratings could be scored as “correct.” 

The subjects of the investigation were students in counseling 
courses and counselors at the University of Minnesota. They 
were administered the 64-item Questionnaire in the spring of 
1947 during the first week of the new term. Subjects came from 
two sources: 106 were students in either of two courses dealing 
with guidance techniques and counseling practices, and 53 were 
graduate students in Psychology or Educational Psychology 
who were either taking graduate courses in Counseling or were 
engaged in half-time college counseling, The former group of 
students answered the Questionnaire a second time, at the final 
session of the course in which they were enrolled. The students’ 
questionnaires were scored with the Minnesota key and with 
the Chase key for the forty scorable items. 



COUNSELOR ATTITUDE QUESTIONNAIRE 


131 


Results 

In the first part of the study, summary statistics for the 40 
items keyed by the Minnesota judges, and these same items 
keyed by Chase’s judges, showed limited spread. The combined 
group of students (graduate and undergraduate) had an aver¬ 
age score of 30 and a standard deviation of 3 items on the 
Minnesota key, and an average score of 27 with a standard 
deviation of 4 on the Chase key. As scores obtained from the 
two keys correlated .20 ± .09, it would appear that the keys 
are quite dissimilar. However, inspection reveals two facts: 
Twenty-four of the 40 items had two adjacent ratings keyed 
"correct” by Minnesota judges, while only seven of the same 
40 items had double ratings on the Chase key. On the Chase 
key, 32 of the 40 items have one extreme or the other keyed 
as "correct,” while Minnesota judges used the extreme re¬ 
sponses for only 24 of the 40 items. These facts, the greater 
tendency to key adjacent responses as "correct” by Minnesota 
judges and a reluctance to key extreme values on the part of 
these judges, could account for the higher mean and smaller 
variability of the Minnesota scale and possibly for the low 
correlation between the latter scale and that of Chase. 

To test the hypothesis that both keys were really different 
they were compared in terms of a trichotomy of “good,” “harm¬ 
ful,” and "doubtful.” Under such a comparison only three of 
the 40 items were classified differently. Three counseling prac¬ 
tices rated “good” by Chase's judges were considered of “doubt¬ 
ful” value by the Minnesota raters. 

One other possibility was suggested as an explanation for the 
low inter-scale correlation, namely, the reliability of the 40- 
item Questionnaire. Reliability estimated in several ways 
(Kuder-Richardson and split-half uncorrected) turned out to 
be about .20. To achieve minimum satisfactory reliability the 
number of items would have to be increased fivefold. Assuming 
the appropriateness of these tests of reliability, it appears that 
the two scales correlate about as highly with each other as 
single administration reliability allows. 

The writers feel that further analysis of such low reliability 
is not called for. Therefore, in spite of the unreliability of its 



1 ^ 2 . EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

principal instrument, this study is presented for whatever inter¬ 
est it may be to the reader. 

The second part of the study involved an assessment of the 
effects of instruction upon counseling attitudes. The Question¬ 
naire was administered both before and after the undergraduate 
courses were given. In these two classes the mean scores ob¬ 
tained with the Minnesota key were about 30, with a standard 
deviation from 3 to 4 items on the pre-course administrations. 
In one class there was a slight, but not statistically significant, 
movement of the post-course mean score upwards toward the 
score of the instructor. In this class the post-course score corre¬ 
lated .11 =fc .25 with final course grades. In the second class 
the post-course mean score was significantly lower (C.R. = 
2.8) than that group’s pre-course mean score. Just why there 
should have been movement away from the instructor’s score 
in this second class is not readily apparent. It may be that this 
instructor was somewhat inconsistent in answering the Ques¬ 
tionnaire and in his actual teaching practices; or, simply, that 
the scale itself is too unreliable. In this second group there ap¬ 
peared to be a moderate degree of relationship between post¬ 
course score and course grades (r = ,42 ± .10). In any event 
these data offer equivocal evidence in support of the hypothe¬ 
sis that counseling attitudes can be modified by training, al¬ 
though it is clearer that subject-matter examinations in these 
two courses are not satisfactory measures of those attitudes. 
Further use of the Questionnaire with control groups would be 
helpful. 

A third estimate of reliability (which suggests the earlier two 
are underestimates) is offered by the correlation between pre¬ 
course and post-course scores for the 106 undergraduate stu¬ 
dents, r = .52. The amount of time elapsed during the courses 
was nine weeks. 

The final problem investigated was the relationship of the 
amount of training and experience in counseling to scores on 
the Questionnaire. The two groups which were compared were 
the 106 undergraduates and 53 graduate students. The Minne¬ 
sota key mean-raw-score difference between the two groups is 
statistically significant (C.R. = 2.8), with the graduates getting 
the higher scores. This is evidence for the common-sense hy- 



COUNSELOR ATTITUDE QUESTIONNAIRE 


133 


pothesis that the longer a student studies in a particular school 
and/or discipline, the more likely he is to acquire the attitudes 
of his instructors, Whether or not this greater agreement with 
the judges is a result of instruction, extra-curricular reading, or 
personal counseling experience, cannot be answered from these 
data. 

Interpretation and Conclusions 

1. Although it was possible to obtain considerable agree¬ 
ment among a carefully selected group of judges on the de¬ 
sirability of certain counseling practices, two obvious limita¬ 
tions of the questionnaire approach to the measurement of 
counselor attitudes must be mentioned. First, most of the 
judges spoke about the artificiality of rating practices in a 
“general counseling situation.” They reported that specifica¬ 
tion of the type of client problem, as well as the nature of the 
agency function, seemed important in keying “correct” re¬ 
sponses. Second, the low reliability of the scale makes the cur¬ 
rent approach suspect. Perhaps more rigorous item construction 
and analysis might yield more consistent results. 

1. There is equivocal evidence that students’ attitudes 
toward counseling practices are susceptible to change with 
formal course training, and they are not markedly related to 
grades in counseling courses. 

3. Scores on a scale of counselor attitudes may have some 
value in differentiating the more-experienced from the less-ex¬ 
perienced counselors or trainees in terms of a given set of 
“correct” responses that have been empirically derived for a 
local situation. 

4. The reservations attendant to the use of the Chase items 
about counselor attitudes are such as to indicate that they 
should not be used in their present form. While the approach 
has possibilities, this study suggests the questionnaire analysis 
of counselor attitudes requires considerably further investiga¬ 
tion before it can be accepted as a useful, reliable tool. 



A NOTE ON THURSTONE’S METHOD OF COMPUTING 
THE INVERSE OF A MATRIX 

WILLIAM C. COTTLE 
University of Kansas 


A Research worker seeking a concise method of computing 
the inverse of a matrix will find this in Thurstone’s method. 

TABLE i 


Computations for Column II, Section B, of Thurstone's Example for Computing the 

Inverse of a Matrix* 


a/i 

—cnbji 

bf> 

.48 

-(.60) (,8o) = - .48 

,000 

.80 

— 60) (.48) =» — .288 

• 5 12 

• 3 6 

-(.60) (.36) = -.aifi 

.144 


-(.60) (1.64) = -,984 

.6^6 

•-‘WflBBm 

-,984 

.656 

.00 

— (.60) (1.00) <= —.60 

— .600 

I. QO 

-(.6a) (0) = .00 

1 .aoo 

.00 

-(.60) (0) » ,00 

.000 

C* 1.00 

— (,6a) (i.aa) = — ,6o 

.400 

Za 1.00 

— ,60 

.400 

Ch 2.64 


1.056 

Xa 2.64 

■MHHHH 

1.056 


* Not rounded to significant figures as in Thurstone’s example (i, p. 47). 


The method is applicable to a matrix of any size, Such a per¬ 
son may find also, upon examining this method as outlined by 
Thurstone, that it would appear to be a rather esoteric solu¬ 
tion . 1 Possibly, because of Thurstone's familiarity with the 
method, he overestimates the cognitive powers of his readers. 
Clarification of one step in the process of computing the in¬ 
verse would simplify the method. It is for this purpose that 
this paper has been written. 


1 Thurstone, L. L. Multiple Factor Analysis. Chicago: University of Chicago Press, 
I947j PP. 4<W 8 - 
















TABLE 2 

Computations for Column III, Section B, of Thurstone’s Example for Computing the Inverse of a Matrix* 


NOTE ON THURSTONE’S METHOD 


*35 









1^6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Thurstone’s directions are simple to follow in setting up 
Section A of the example he gives. 2 To compute Section B and 
Section C, the steps are as follows: 

1. Column I of Section B is copied exactly from Column I of 
Section A for both matrices, 

2. The reciprocal, I/bn, is the reciprocal of the first entry in 
Column I of Section B, or 1/.80 = 1.25, This is recorded at 
the bottom of this column as shown in Thurstone’s example. 

3. Values for the first column of Section C are computed by 
multiplying Column I of Section B by this reciprocal, 

qi = Ai(i/Ai). 

(The reciprocal is constant for both the values of the orig¬ 
inal matrix and those of the identity matrix.) 

4. Column II of Section B is computed by the formula: 

bji = an — An) 

where £21 — .60 is constant for the entire column, both the 
original matrix and the identity matrix. Computations 
for Column II of Section B are shown in detail in Table 1. 

5. The second column of Section C is computed by the formula: 

Cjz — bjz(l /b 2 i). 

Where b K is the second entry in Column II of Section B. 

6. Column III of Section B is computed by the formula: 

bfi = ~ bjiCM — Z',' 2 f 3 2 

where c 3 1 = .45 and c 31 — ,27 are constants for each appro¬ 
priate column of Section B, Computations for Column III 
of Section B are shown in detail in Table 2. 

7. The rest of the computations can be followed from Thur¬ 
stone’s explanation. 

It is hoped that this explanation will enable anyone to follow 
this method of computing an inverse. The writer has used 
Tucker’s method,® and this method in computing an inverse 
of a matrix of the order of 8 X 8, and prefers the latter methpd. 
He would suggest also that no rounding of figures be done 
until the computation of the inverse is reached. 

1 The writer is indebted to Dr. Clyde Coombs of the University of Michigan for 
the information necessary to follow Thurstone’s explanation. The writer spent three 
days in company of two competent mathematicians in an unsuccessful attempt to 
follow the method before resorting to a letter to Dr. Coombs. 

3 Tucker, L. "A Method for Finding the Inverse of a Matrix.” Psychometrika , III 
(1938), 189-197. 



NOMOGRAPH OF PETERS AND VAN VOORHIS’ 
APPROXIMATION FORMULA FOR CORRECTING 
INTERFUNCTION CORRELATION COEFFICIENTS 
FOR HETEROGENEITY 

WILLIAM A. REYNOLDS 
National Broadcasting Company 

In setting up a testing procedure for selection and placement 
of employees in a large organization, it is often the practice to 
administer one or two tests to certain groups of applicants, and 
to add new tests to the schedule from time to time. Thus, when 
it is desired to construct a test battery from the results of a 
multiple-regression study, it is found that the populations to 
which the individual tests have been administered are larger 
than the population to which the two or more tests in combina¬ 
tion have been administered, The larger populations usually 
have larger standard deviations; statistically they are more 
heterogeneous. Since they are the populations on which later 
test batteries will be validated, the information on hetero¬ 
geneity may be used to predict more accurately the true rela¬ 
tionships between two tests which have been administered to 
but a fraction of the number to which each separately has been 
administered. 

It is well known that the size of a coefficient of correlation is 
affected by the heterogeneity (range of talent) of the population 
on which it is computed. If a formula were available for correct¬ 
ing the correlation between two tests by taking into considera¬ 
tion the ranges of talent on both tests, a better estimate of the 
true correlation between them could be obtained. Such a for¬ 
mula is available in Peters and Van Voorhis’ Statistical Pro¬ 
cedures and Pheir Mathematical Bases . 1 

These authors develop their formula by considering first the 
problem of estimating a corrected reliability coefficient. The 

1 Peters, C. C. and Van Voorhis, W. R. Statistical Procedures and Their Mathe¬ 
matical Bases. New York: McGraw-Hill Book Co., Inc., 1940. Pp. 108-110. 


137 



I38 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


formula for the correction of a reliability coefficient for hetero¬ 
geneity is given by Peters and Van Voorhis in formula io,g, as 
follows: 

(r x Vl — Rn (Formula for correcting a reliability f v 
jT coefficient for heterogeneity) 


where, 

cr* = standard deviation of the shorter range of talent 
2 X = standard deviation of the longer range of talent 
ru = reliability coefficient of the shorter range of talent 
Rn = reliability coefficient of the longer range of talent. 
Any unknown term of this formula may be easily computed 
by nomograph 55 in the Handbook of Statistical Nomographs 
by Dunlap and Kurtz. 2 

The case of mter-function correlation is more complicated. 
The assumption is made that the “variance of the distribution 
of true scores in the one function from their corresponding 
true scores in the other function is the same in the shorter 
range as it is in the longer one.” The following formulas are 
derived: 


o'* __ VRiIk (RIy/Rny) . 

E x V rit, — (riy/riiy) 

£V _ VRlly — (Rly/Rllj) 

£y Vru, - (liy/mj 


(13°) 


(Formula for correcting inter-function r’s for heterogeneity) 


Since these formulas involve reliability coefficients of the meas¬ 
urements in both functions for both ranges, and information 
regarding these usually is not available, a formula using ob¬ 
tained scores rather than true scores is presented: 


_ Vi — Ril y 

Ex V1 — V 


(Approximate formula for correcting 
inter-function r’s for heterogeneity) 


(131a) 


Similarly, 

ffy I -- Rxy 

Ey VI — tiy 


(i3 lb ) 


2 Dunlap, J. W,, and Kurtz, A. K. Handbook of Statistical Nomographs. Yonkers- 
oti-the-Hudson: World Book Co., 1931. 



NOMOGRAPH OF APPROXIMATION FORMULA 


I 39 


where the assumption is made that the standard errors of 
estimate are the same in the shorter range as in the longer one. 

Under the condition of having an r x3 , between two tests in a 
shorter range of talent than the range of either test taken 
separately, the r*y must be corrected twice: once for hetero¬ 
geneity in each test. Taking first the correction for hetero¬ 
geneity in the x variable, formula 131a may be expressed as 
follows: 

ox (i — liy) (Approximate correlation coeffi- 

( 2 . 0 2 cient when the x variable has (A) 

been corrected for heterogeneity) 

In turn, correcting R*y in formula (A) for heterogeneity in the 
y variable, we get: 

<r£(i — RS y ) (Approximate correlation coeffi- 
(2 y ) 2 cient when both the x and y vari- 

ables have been corrected for ' 
heterogeneity) 

Where R* y is the corrected coefficient of correlation when both 
variables are corrected for heterogeneity, and is the coeffi¬ 
cient obtained from formula (A). 

The solution of these equations is rather involved but can be 
estimated quite simply on the accompanying nomograph. The 
steps to be taken may be illustrated on the following problem. 

From the shorter range of talent, the group to which tests x 
and y both were administered, the correlation coefficient and 
the standard deviations of x and y were found to be 

txy —— .6^, 0"x = l 1.0 ffy l8.0 

And from the longer range of talent, the whole populations on 
which either of the tests were administered, the standard devia¬ 
tions were found to be 

2x = 15.0 2 y = 11.0 

The problem is to find: 

Rx y = coefficient of correlation corrected for heterogeneity in 
x, and 

Rxy = coefficient of correlation corrected for heterogeneity in 
both x and y. 

Step 1. Place a straightedge on the line at the left which 
corresponds to the standard deviation obtained on the shorter 





I40 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

range of talent in the x variable (<r„ = <r x = 12) across to the 
correlation coefficient r„ = r*y = ,65. (The subscript „ on the 
nomograph refers to "shorter” or sample distribution, although 
this standard deviation may occasionally be greater than that 
of the “longer” distribution of the whole population to which 
the test has been administered.) 

Step 2, Place a pin on the middle reference line of the nomo¬ 
graph, and pivot the straightedge so that it reads the standard 
deviation obtained from the longer distribution (<r L = 2* = 15) 
of testx, and read off the corrected coefficient (r L = R iy = .80) 
from the scale at the right. This obtains R* y , the correction for 
r xyi when the x variable alone has been corrected for 
heterogeneity. 

Step 3. In turn to correct the R* y for heterogeneity in the y 
variable, place a pin on the value for the corrected coefficient 
(Rxy = .80) and pivot the straightedge so that it reads the 
standard deviation of test y on the shorter range of talent 

(<T B = <Ty — l8). 

Step 4. Pivot again with a pin held on the middle reference 
line; change the left side of the straightedge to read the stand¬ 
ard deviation on the longer range of talent of test y 
(<x L =2 y = 22). The result, = .87, is read from the scale 
at the right. This is the correlation coefficient between the two 
tests when both have been corrected for heterogeneity. 

Quite often on a matrix of intercorrelations such as would be 
obtained in the development of a test battery of aptitude tests, 
the inter-function correlation coefficients will be corrected up¬ 
ward when the correction for heterogeneity is made in one 
variable, but will be reduced when corrected again for the 
heterogeneity in the other variable. When this occurs, it will be 
caused by the standard deviation of the test group on one 
variable being larger than the standard deviation reported in 
the published norms or obtained on the larger industrial group 
to which the test has been administered. But in the cases where 
the standard deviations of the two restricted distributions are 
less than the standard deviations of the corresponding wider 
distributions, the corrected correlation coefficients always will 
be higher. 



HOMOGRAPH OF APPROXIMATION FORMULA 


Hi 



Approximate Formula for Correcting In ter-function r's for Heterogeneity 



A SINGLE CHART FOR TETRACHORIC r 


WILLIAM LEROY JENKINS 
Lehigh University 

The widely-used Thurstone diagrams 1 for determining tetra- 
choric r are now out of print. As a substitute, a short-cut 
method has been devised which employs a single chart. 

Essentially, the chart compares the actual percentage-in- 
excess-of-chance in one cell with the percentages-in-excess-of- 
chance for r’s of .90, .80, .70, and .60. The interpolation is made 
graphically if the r is above .60 and arithmetically if the r is 
lower. 


Method with Examples 


Example A: 273 

296* 

Example B: 55 

80 

4 2 3 

H 

*70 

20 


1. Mark the number (*) in the upper right or lower, left 
whichever number is smaller. 

a. Compute the two percentiles at which the distributions 
are cut. 


296 + 173 
1016 

296 -f- 24 
1016 


= 56 . 1 % (above) 
= 3 1 -S% (dght) 


70 -f 20 
225 

70 + 55 

225 


= 40.0% (below) 

= 55 - 6 % (Lft) 


3. Multiply the two percentiles to obtain the chance per¬ 
centage in the marked cell. 

17.6% 22,2% 

4. Compute the actual percentage in the marked cell, 

— = 3 l - l % 

225 

Subtract result 3 from result 4 to obtain the actual per- 


TcTe - 


yChesire, L., Saffir, M, and Thurstone, L. L, Computing Diagrams for the Tetra- 
chorh Correlation Coefficient. Chicago: Chicago University Press, 1933. 


I 4 1 



A SINGLE CHART FOR TETRACHORIC R 143 

centage in excess of chance. Draw a vertical line downward 
from this value on the scale at the top of Figure I. 

n.6% 8.9% 

6. In each of the four sets of curves in Figure I: Find the 
larger cutting percentile on the ordinate scale. Move across in- 


agtual percentage in excess of chance 
6 7 B 9 10 II 12 13 14 15 IS 17 IB 



The dotted lines indicate the solution for Example A in the text. 

terpolating between the curves to the smaller cutting per¬ 
centile. Drop a vertical to the baseline and mark this point. 
(The four points represent respectively the percentages-in-ex- 
cess-of-chance for r’s o f .go, .80, ,70, and .60.) 

7. Through the points marked on the four baselines, draw a 
curve. Where the curve intersects the straight line drawn in 
step 5, read off the tetrachoric r from the scale on the right. 

tetrachoric r = .89 
(Example A) 







I4| EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


8, If the carve does not intersect the vertical line, the tetra- 
choric r is less than .60. Make an arithmetical interpolation as 
indicated below. 



The chart in Figure I is too small for actual use, hut the 
author will he glad to furnish without charge a photoprint re¬ 
production of an 8j x 11 chart on cross-section paper. In em¬ 
pirical tests this chart appears to give the same answers as the 
Thurstone diagrams, 



NEW TESTS* 


Algebra Prognosis Test , by Corydon L. Rich. Designed as a help in 
forecasting a pupil’s work and as a guide for sectioning 
when numbers warrant more than one section. Range: high 
school and college, Working time: 3a minutes. Published 
by the C. A. Gregory Co. 


Aptitude Tests for Occupations by Wesley S. Roeder and Herbert B. 
Graham. There are six tests in the battery attempting to 
measure personal-social, mechanical, general sales, clerical 
routine, computational and scientific aptitudes. Range: high 
school and college students, and adults. Working time: 
1 hour and 50 minutes for complete battery. Published by 
California Test Bureau. 


Children’s Apperception Test, by Leopold Beliak, A personality test 
specifically designed for use with children between three and 
ten years of age, of both sexes and of all ethnic groups. Con¬ 
sists of ten pictures of animals in various social situations. 
Price: set of pictures, manuals and 30 record analysis blanks, 
$9.00. Published by C. P. S. Company. 


Comprehensive Examination in Psychology, by M. Pullins Claytor. 
An achievement test for college students in psychology. 
Working time: ?o minutes. Published by the C. A. Gregory 
Co. 


Cooperative General Culture Test (Forms X and Y), by Norman J. 
Blair, Jeanne M. Bradford, Mitian May Bryan, Paul J. 
Burke and Herbert Danzer. Designed to provide an indica¬ 
tion of the student's general cultural background. The con¬ 
tent has been determined by the concensus of a number of 
scholars in various fields, Consists of six sections covering 
current social problems, history and social studies, litera¬ 
ture, science, fine arts and mathematics. Range: college 
students, Working time: 180 minutes. Price: test booklets, 
per package of 2.5, $3,90; answer sheets, per package of 2.5, 
$1.70. Published by Cooperative Test Division of the Educa¬ 
tional Testing Services. 

“The tests listed bear 1949 or 1950 copyright dates. The addresses of publishers 
are given at the end of section, In some instances, Certain details (particularly 
prices), are not included because they were not available at the time of going to 
press. 

14 5 



I46 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Cowan Adolescent Adjustment Analyzer, by Edwina A, Cowan, Wil¬ 
bert J. Mueller and Edna Weathers. Intended for use as a 
screening device to discover individuals who would profit 
from referral to visiting teachers, psychiatrists, guidance 
counselors, etc. Range: junior and senior high school. Work¬ 
ing time: no time limit. Price: $1.65 per package of 15 tests. 
Published by Bureau of Educational Measurements, Kansas 
State Teachers College. 


Diagnostic Tests oj Achievement in Music (Form A), by M. Lela 
Kotick and T. L. Torgerson. Enables the teacher to deter¬ 
mine each pupil’s level of mastery of the basic theory and 
skills in music and to locate the nature of the weaknesses or 
difficulties in music fundamentals for individuals as well as 
classes. Range: school music classes. Working time: ap¬ 
proximately 45 minutes, Published by California Test Bu¬ 
reau. 


Geometry Attainment Test, by R. D. Walton. An achievement test 
for students with 6 months or more of geometry. Working 
time: go minutes. Price: tests 5/- per dozen; manual, 1 /- 
each. Published by University of London Press, Ltd. 


Graded Arithmetic-Mathematics Test, by Philip E. Vernon. Con¬ 
structed, like the Stanford-Binet intelligence scale, from 
sets of short problems, one set for each year level. Scores 
are expressed in Arithmetic-Mathematics Ages from 7-21 
years. Range: ages 7-21, Working time: 20 minutes. Pub¬ 
lished by University of London Press, Ltd. 


Guilford-Zimmerman Temperament Survey, by J. P, Guilford and 
Wayne S. Zimmerman, Scores are obtained for the following 
areas: general activity, restraint, ascendance, sociability, 
emotional stability, objectivity, friendliness, thoughtfulness, 
personal relations and masculinity. Range: senior high 
school, college and adults. Working time: approximately 45 
minutes. Price: package of 25 reusable answer booklets, 
$3.75; answer sheets, 35?each. Published by Sheridan Supply 
Company. 


Heston Personal Adjustment Inventory , by Joseph C. Pleston. Designed 
to measure, for guidance purposes, the personal adjustment 
of the normal individual in six areas: analytical thinking, 
sociability, emotional stability, confidence, personal rela¬ 
tions, home satisfaction. Range: high school and college. 
Working time: 40 to 50 minutes (no time limit). Price: $2.25 



NEW TESTS 


H7 


per package of 25 tests, Published by World Book Com¬ 
pany. 


Holborn Vocabulary Test for Young Children, by A. L. Watts. Con¬ 
sists of 100 questions concerning body parts, household 
articles, eating, drinking, actions with hands and fingers, 
etc., to be answered orally by the child. Range: 3^ years of 
age and upward. Working time: no time limit. Price: i/-. 
Published by George G, Plarrap and Company, Ltd. 


Iowa Every-Pupil Tests of Basic Skills (Form /), prepared under the 
direction of E. L, Lindquist. A battery of tests designed to 
measure certain skills involved in reading, work-study, lan¬ 
guage and arithmetic at the elementary-school level. There 
are fourteen separate tests: Reading Comprehension , Vocabu¬ 
lary, Map Reading, Use of References, Use of Index, Use of. 
Dictionary, Graphs, Punctuation, Capitalization, Language. 
Usage, Spelling, Arithmetic Concepts, Arithmetic Processes 
and Arithmetic Reasoning. Range: grades 5-9. Working time: 
five and one-half hours for complete battery. Price: avail¬ 
able upon application. Published by Science Research As¬ 
sociates, 


Metropolitan Readiness Tests, by Gertrude H. Hildreth and Nellie L. 

Griffiths. Consists of six subtests designed to measure a 
child’s readiness to undertake the work of the first grade. 
The first four tests measure comprehension of words and 
sentences and visual perception, the fifth measures number 
knowledge and the sixth measures a combination of visual 
perception and motor control. Contains also a supplementary 
Drawing A Man test. Range: pre-first grade children. Work¬ 
ing time: approximately 60 minutes. Price: $2.10 per package 
of 25 tests. Published by World Book Company. 


Murphy-Durrell Diagnostic Reading Readiness Test, by Helen A, 
Murphy and Donald D. Durrell. Designed to furnish 
measure of three critical abilities: auditory discrimination, 
visual discrimination, learning rate. It is a test for group 
use. Range: intended for first graders. Working time: test 1 
and 2 approximately 1 hour (no time limit); learning rate 
test has specific time limits. Price: $1.35 per package of 25 
test booklets; $1.25 per package of flash cards. Published 
by World Book Company. 


Musical Aptitude Test (Series A), by Harvey S. Whistler and Louis 
P. Thorpe. Designed to measure an individual’s aptitude for 
the study of music. Consists of five parts: rhythm recogni- 



I48 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

tion, pitch recognition, melody recognition, pitch discrimina¬ 
tion, and advanced rhythm recognition. Range: grades 4-10, 
Working time: approximately 40 minutes. Published by 
California Test Bureau. 


Revere Safety Test , by Revere Copper and Brass, Inc., in cooperation 
with the Psychological Evaluation and Services Center 
Syracuse University. Designed to measure knowledge of' 
correct safety procedures in the industrial situation. The 
subject is required to tell whether each of 162 pictures illus¬ 
trates good or bad safety practices. Pour areas of industrial 
safety are covered: general safety, pilings, carrying and 
traffic, tools and machine operation, Working time: 20 
minutes. Price: reusable test booklets, each 30ji; answer 
sheets, package of 25, $1.00; scoring stencil, each iof!. Pub¬ 
lished by Science Research Associates. 


Small Parts Dexterity Test , by John E. and Dorothea M. Crawford. 

A performance test designed to measure fine eye-hand co¬ 
ordination. Part I measures dexterity in using tweezers to 
insert small pins in close-fitting holes in a plate and to place 
small collars over protruding pins. Part II measures dex¬ 
terity in placing small screws in threaded holes in a plate 
and screwing them down with a screwdriver until they drop 
through the plate into a metal dish below. Working time: 
about 15 minutes. Price: $25.00 complete with manual and 
spare parts. Published by Psychological Corporation. 


SRA Self-Scorer, by Maurice E. Troyer and George W. Angell. A 
new type of answer sheet designed for use with any teacher- 
constructed objective test. Its primary function is to pro¬ 
mote student learning by immediately revealing whether 
test question has been answered correctly or incorrectly. 
Questions must be arranged to fit one of the eight answer 
keys, Four types of answer keys are provided: 1) true-false 
(space for 300 questions); 2) true-false and multiple choice 
(space for 210 questions); 3) four-choice (space for 150 
questions); and 4) five-choice (space for 150 questions). 
Each of these types is published in two different forms. 
Price: self-scorer, complete, each $1.50; answer sheets, per 
package of 25, $1,00. Published by Science Research As¬ 
sociates. 


SRA Youth Inventory (Form A), by H. H. Remmers and Benjamin 
Shimberg. A check list of 298 questions that has been de¬ 
signed as a tool to help teachers, counselors and school ad¬ 
ministrators to identify quickly the problems that young 



NEW TESTS 


I49 


people say worry them most, Range: teen-age students. 
Working time: approximately thirty minutes. Price: reusable 
booklet with answer pad, 48^; package of 25 answer pads, 
$1.75; scoring stencil, tof Machine scored form: reusable 
answer pad, 42^; package _ of 100 answer sheets, $2.90; 
scoring stencils $2.50. Published by Science Research As¬ 
sociates. 


Social Intelligence Test , by J. A. Moss, T. Hunt and K. A. Omwake. 
Designed to measure one’s ability to get along with others. 
Consists of five parts measuring: judgment in social situa¬ 
tions, memory for names and faces, observation of human 
behavior, interpretation of mental state from spoken or 
written words, and sense of humor. Range: for high school, 
college and industrial use. Working time: 45 minutes (two 
shorter forms of 40 minutes and 30 minutes are available). 
Price: $3.75 per package of 25 tests (regular form). Pub¬ 
lished by Center for Psychological Service, George Washing¬ 
ton University. 


State High School Testing Service for Indiana offers a list of 49 sub¬ 
ject-matter tests, intelligence scales and inventories based 
on the Indiana courses of study, approved text books and 
teaching practices. The list is as follows: Agriculture: Animal 
Husbandry, Farm Shop Tools (Forms A and B); Commercial: 
Commercial Arithmetic, Bookkeeping (first and third semes¬ 
ters), Shorthand (first and third semesters), Typewriting (first 
and third semesters); English: Mechanics of Written Eng¬ 
lish (grades 9-12), Tools of Written English (grades 7-8), 
Purdue Reading Test (grades 7-12); Health and Safety 
Education; Home Economics (high school)'. Child Develop¬ 
ment, Clothing I (Forms A and B), Clothing II, Foods I, 
Foods II, Home Care of the Sick, Housing of the Family; 
Home Economics {grades 7—8) : Care and Play of Children, 
Clothing Problems, Food in the Home, Housekeeping; Lan¬ 
guage: French Recognition Vocabulary (Forms K and L), 
Latin (first and third semester), Spanish (first semester); 
Mathematics: Algebra (first and third semesters), Arithmetic 
Fundamentals (Forms A and B), Plane Geometry (first 
semester), Solid Geometry, Trigonometry; Mechanical 
Drawing; Science: Biology (first semester), Chemistry (first 
semester), General Sciences (first semester), Physics (first 
semester); Social Studies: Civics-Junior High School (first 
semester), Civics-Senior High School (first semester), Civics- 
Senior High School (one semester course), Economics, Amer¬ 
ican History (first semester), World History (first semester), 
Two Thousand Test Items in American History (bound, 
goji); Guidance: A.C.E. Psychological Examination, Otis 



150 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Quick-Scoring (Gamma Am), Henmon-Nelson (grades 7-12) 
High School Attitude Scale (Forms A and B), Purdue 
Personality Schedule, Maturity RatingScale, PurduePhysi- 
cal Science Test; Teacher Self-Evaluation: A Diagnostic 
Teacher Rating Scale (grades 4-8, Forms A and B), Purdue 
Rating Scale for Instruction (in lots of 500 or more), The 
prices of these tests range from i |(4 to 6 j 4 , plus 1 ^ per copy 
for tests going out of state. Exact prices may be obtained 
from publishers, State High School Testing Service for 
Indiana, Purdue University, Lafayette, Indiana. 


Tests for Infants 4.-12 Weeks Old ( Test A), by A. R. Gilliland. De¬ 
signed to measure adaptation to the physical and social 
environment. Price: $2.00 per package of 25 test record 
sheets and examiner’s manual. Test equipment may be ob¬ 
tained from the author at Northwestern University. Pub¬ 
lished by Houghton Mifflin Company. 


Test of English Usage (Forms A & B), by Henry D. Rinsland, Ray¬ 
mond W. Pence, Betty S. Beck and Roland L. Beck. De¬ 
signed to measure the student’s ability to recognize and 
apply the basic rules of English composition. Consists of 
three parts: mechanics of writing; accurate use of words; 
building sentences and paragraphs. Range: high school and 
college. Working time: no time limit. Published by Cali¬ 
fornia Test Bureau. 


Wechsler Intelligence Scale for Children, by David Wechsler. A psy¬ 
chodiagnostic instrument which has grown logically out of 
the Wechsler-Bellevue intelligence scales used with adoles¬ 
cents and adults. In fact, most of the items in the W.I.S.C. 
are from Form II of the earlier scales, the main addition 
being new items at the easier end of each test to permit 
examination of children as young as five years of age. Range: 
primarily for use with the school-age child. Working time: 
45 minutes to 1 hour. Price: $19.50, including manual and 
25 record forms, Published by the Psychological Corpora¬ 
tion, 


ADDRESSES OF THE PUBLISHERS AND DISTRIBUTORS OF 
THE TESTS LISTED 

Bureau of Educational Measurements, Kansas State Teachers Col¬ 
lege, Emporia, Kansas. 

Bureau of Publications, Teachers College, Columbia University, New 
York City. 



NEW TESTS Ijl 

C. P. S, Company, P.O. Box 42, Grade Station, New York 1!, New 
York. 

California Test Bureau, 5916 Hollywood Boulevard, Hollywood a!, 
California. 

Center for Psychological Services, George Washington University, 
Washington 6, D. C. 

Educational Testing Service, 20 Nassau Street, Princeton, N. J. 

The C. A. Gregory Company, 345 Calhoun Street, Cincinnati 19, 
Ohio. 

George G, Harrap and Company, Ltd., 182 High Holborn, London, 
WC l, England. 

Psychological Corporation, J22 Fifth Avenue, New York 18, New 
York. 

Science Research Associates, 228 S, Wabash Avenue, Chicago 4, III 

Sheridan Supply Company, P.O. Box 837, Beverly Hills, California. 

The Stack Company, 9th at Lavaca, Austin, Texas, 

University of London Press, Ltd,, Warwick Square, London, E.C, 4, 
England. 

World Book Company, Yonkers-on-Hudson, New York. 



THE CONTRIBUTORS 


Anne Anastasi—Ph.D,, Columbia University, 1931. Lecturer in 
Psychology, 192.9—1930; Instructor in Psychology, 1930-1939, Bar¬ 
nard College, Assistant Professor and Chairman of the Department, 
Dept, of Psychology, Queens College, N. Y, City, 1939-1946. Asso¬ 
ciate Professor of Psychology, Graduate School, Fordham University, 

1947- . Author of Differential Psychology; co-author of Fields of Psy¬ 
chology and Foundations of Psychology; author of approximately 50 
monographs and articles in psychological journals. Fellow, American 
Association for the Advancement of Science, American Psychological 
Association (Divisional Representative, 1947-), New York Academy 
of Sciences (Chairman, Section of Psychology, 1940). Member, Phi 
Beta Kappa, Sigma Xi, New York State Psychological Association 
(Vice-President, 1945), Eastern Psychological Association (Board of 
Directors, 1944-1946, 1948-; President, 1946-1947). 

Donald K, Beckley—Ph.D., University of Chicago, 1948. Instruc¬ 
tor, Rochester (New York) Institute of Technology, 1939-1941. Staff 
member, Examination Staff for the United States Armed Forces In¬ 
stitute, University of Chicago, 1941-1943. Professor of Retailing and 
Director, Simmons College Prince School of Retailing, 1946-. Co¬ 
author of Merchandising Techniques and The Retail Sales-person a 
Work , and author of articles on employment testing in retailing. 

Walter R. Borg—Ph.D,, University of California, 1948. Assistant 
in Educational Psychology, University of California, 1946-1948. As¬ 
sistant Professor of Educational Psychology, University of Texas, 

1948- . 


Clyde H. Coombs —Ph,D., University of Chicago, 1940. Research 
Assistant, Psychometric Laboratory, 1937-1940. Research Assistant, 
Mathematical Biophysics, University of Chicago, 1940-194T. Person¬ 
nel Psychologist, War Dept., 1941-1946. Assistant Professor of 
Psychology, 1947-1948, Associate Professor, 1948-, University of 
Michigan. Fellow, American Psychological Association. Member, 
Psychometric Society, American Statistical Association, Institute of 
Mathematical Statistics, Phi Beta Kappa, Sigma Xi. 

Lee J. Cronbach—Ph.D,, University of Chicago, 1940, Instructor, 
Assistant Professor, Associate Professor, State College of Washington, 
1940-1946. Associate Psychologist, University of California Division 
of War Research, 1944-1945. Assistant Professor of Education, Uni¬ 
versity of Chicago, 1946-1948. Associate Professor of Education, 
Bureau of Research and Service, University of Illinois, 1948- Author 



THE CONTRIBUTORS 


153 


of Essentials of Psychological Testing , and articles. Fellow, American 
Psychological Association. Member, American Educational Research 
Association. 

William C. Cottle —Ed.D., Syracuse University, 1949. New York 
State Public Schools, 1931-1945. Instructor and Chief of Veterans 
Testing Service, Syracuse University, 1945-1947. Assistant Professor 
of Education and Counselor, 1947-1948; Associate Professor and 
Assistant Director, Guidance Bureau, 1948, University of Kansas. 
Associate Member, American Psychological Association. Professional 
Member, National Vocational Guidance Association. Member, Phi 
Delta Kappa, Kansas Psychological Association, Kansas Academy of 
Science, American Association of University Professors. 

Edward E. Cureton—Ph.D., Columbia University. Associate Pro¬ 
fessor and Professor of Education, Alabama Polytechnic Institute, 
1931-1941. Senior Educational Statistician, U. S. Office of Education, 
1941-1942. Chief, Testing Unit, Civilian Personnel Branch, Hq. 
Army Service Forces, 194:2-1943. Chief, Civilian Personnel Research 
Subsection, Adjutant General’s Office, 1943-1945. Chief, Technical 
Operations and Control, Personnel Research Section, Adjutant Gen¬ 
eral’s Office, 1946-1947. Staff member, Richardson, Bellows, Henry 
and Co,, Inc., 1945-1946; 1947-1948, Professor of Psychology, Uni¬ 
versity of Tennessee, 1949-. Fellow, American Psychological Associa¬ 
tion, American Association for the Advancement of Science. Member, 
American Educational Research Association, Institute of Mathemat¬ 
ical Statisticians. Past President, Psychometric Society. 

Frank J. Dudek—-Ph.D,, University of Southern California, 1947. 
Aviation Psychologist, AAF Aviation Psychology Program, 1942- 
1946. National Research Council Predoctoral Fellow, 1946-1947. 
Assistant Professor of Psychology, Northwestern Uhiversity, 1947—. 
Associate Member, American Psychological Association. Member, 
Midwestern Psychological Association, Phi Beta Kappa, Sigma Xi, 
Phi Delta Kappa, Phi Kappa Phi, Psi Chi, American Association of 
University Professors. 

William Leroy Jenkins—Ph.D., University of Michigan, 1936. In¬ 
structor, Assistant Professor, Lehigh University, 1935-43- Research 
Associate, University of California Division of War Research, 1943— 
44. Supervisor, Training Aids, Columbia University Division of War 
Research, Submarine Training Section, 1944-45. Associate Professor 
of Psychology, Lehigh University, 1946-. Author of articles on cutane¬ 
ous sensitivity, Member, American Psychological Association. 

Robert B. Kamm —Ph.D., University of Minnesota, 1948. Member, 
Counseling Staff, The General College, University of Minnesota, 
1946-1948, Dean of Students, Drake University, at present. Associate 
Member, American Psychological Association. Member, American 
College Personnel Association, Phi Delta Kappa. 



I54 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Robert W. Kleemeier —Ph.D., University of Michigan, 1942. 
Teaching Fellow, University of Michigan, 1938-1941. Clinical Coun¬ 
selor and Instructor in Psychology, University of Illinois, 1941-1942. 
Instructor in Psychology, Northwestern University, 1942-1943. Clas¬ 
sification and Selection Officer, U. S. Maritime Service, 1943-1945, 
Assistant Professor of Psychology, Northwestern University, 1946-] 
Fellow, American Psychological Association. Member, Midwestern 
Psychological Association, Phi Beta Kappa, Sigma Xi, Phi Sigma 
American Association of University Professors. 

William A. McClelland- —Ph.D., University of Minnesota, 1948. 
With the U. S. Army Air Forces: Aviation Psychologist, 1942-1946. 
Instructor, University of Minnesota, 1946-1948. Assistant Professor 
of Psychology, Brown University, 1948-. Member, Phi Beta Kappa, 
Sigma Xi, Psi Chi, Phi Delta Kappa, American Psychological Asso¬ 
ciation, National Vocational Guidance Association, American Asso¬ 
ciation for the Advancement of Science, Psychometric Society. 

Joseph E. Moore —Ph.D., George Peabody College, 1935. Instruc¬ 
tor in Psychology, North Carolina State College, 1931-1935. Pro¬ 
fessor of Psychology, George Peabody College, 1936-1942. With the 
U.S. Army: Classification Officer, 1942-1945; Personnel Consultant, 
1944-1945. Professor and Chairman of the Department of Psy¬ 
chology, Georgia Institute of Technology, 1945-. Member, American 
Psychological Association, Diplomate in Industrial Psychology, 
American Board of Examiners in Professional Psychology, Southern 
Society of Philosophy and Psychology. President, Georgia Psycholog¬ 
ical Association, 1948. Professional Member, American Vocational 
Guidance Association, Phi Kappa Phi, Phi Delta Kappa. 

William A. Reynolds —M.A., University of California, 1941. In¬ 
structor in Psychology, Bakersfield Junior College, 1941-1942. Psy¬ 
chologist, War Dept., Placement and Testing Branch, McClellan 
Field, Sacramento, Calif., 1942-1944. Statistical Analyst, War Dept., 
Wright Field, Dayton, Ohio, 1944-1945. Research Associate, Re¬ 
search Dept., National Broadcasting Co., 1946-1949. Lecturer, New 
York University, 1947, 1949. Author of articles on testing and person¬ 
nel placement, statistical methods, radio research. Member, American 
Psychological Association, New York State Psychological Associa¬ 
tion, American Association for Public Opinion Research, American 
Statistical Association, American Marketing Association, Institute of 
Mathematical Statistics. 

Edward A. Rundquist —Ph.D., University of Minnesota, 1932. 
Assistant Psychologist, Child Guidance Clinic, Minneapolis Public 
Schools, 1928-1929. Instructor in Research, Institute of Child Wel¬ 
fare, University of Minnesota, 1929-1930. Chief Psychologist, Child 
Guidance Clinic, Minnesota Public Schools, 1930-1933. Research 
Fellow, University of Minnesota, 1933-1934. Chief Psychologist, 



THE CONTRIBUTORS 


155 


Child[Study Department, Minneapolis Public Schools, 1934-1935. 
Assistant Director, Psychological Laboratory, Cincinnati Public 
Schools, 1935-1941. Various positions in Personnel Research Section, 
Adjutant General’s Office, 1940,-1946. Assistant Director Personnel 
Research, Owens-Illinois Glass Company, 1946-1949. Chief, Person¬ 
nel Evaluation and Criterion Research, Personnel Research Section, 
Adjutant General’s Office, 1949-, Co-author with Raymond F. Sletto 
of Personality in the Depression. Author of articles in various journals. 
Member, American Psychological Association, Midwestern Psycho¬ 
logical Association, Sigma Xi. 

H. Wallace Sinaiko—M.A., University of Minnesota, 1947. Grad¬ 
uate Teaching Assistant, Dept, of Psychology, University of Min¬ 
nesota, 1946-1947. Assistant Employment Manager, L. Bamberger 
& Co., Newark, N. J., 1947-1949. Graduate student, New York 
University, 1949-. Research Psychologist with Human Engineering 
Laboratory, Research Division, College of Engineering, New York 
University, 1949-. Member, American Psychological Association, 
Eastern Psychological Association, New Jersey Psychological Asso¬ 
ciation, Psi chi. 

EdwardA. Suchman—Ph.D., Columbia University, 1947. Research 
Assistant, Princeton University, 1937-1939. Research Fellow, Rocke¬ 
feller Foundation, 1939-1940. Research Associate, Columbia Uni¬ 
versity, 1940-1942. Research Associate, Research Branch, War Dept., 
1942-1946. Research Associate, Social Science Research Council, 
1946-1947. Assistant Professor and Excutive Officer, Dept, of Soci¬ 
ology; Associate Director, Social Science Research Center, Cornell 
University, 1947-. Author of articles in sociological and psychological 
journals. Co-author of The American Soldier; Studies in Social Psy¬ 
chology in World War II. Member, American Psychological Associa¬ 
tion, American Sociological Society, Sociological Research Associa¬ 
tion, Association for American Public Opinion Research. 

C. Gilbert Wrenn—Ph.D., Stanford University, 1932. Vocational 
Counselor, Stanford University, 1928-1936. Associate Director, Gen¬ 
eral College; Associate Professor of Educational Psychology, 1936- 
1938; Professor of Educational Psychology, 1938-, University of 
Minnesota. Consultant, Student Personnel, Teacher Education Com¬ 
mission of the American Council on Education, 1939-1942. On 
military leave with the U. S. Armed Forces, Personnel Officer in 
Bureau of Naval Personnel and Pacific Area, 1942-1946. Associate, 
American Youth Commission, 1939-1941. Author and co-author of 
Student Personnel Problems , Studying Effectively, Aids to Group Guid¬ 
ance, Time on Their Hands, and numerous journal articles. President, 
National Vocational Guidance Association. Vice-President, Council 
of Guidance and Personnel Association, 1946-. 




EDUCATIONAL and 
PSYCHOLOGICAL 



MEASUREMENT 



VOLUME TEN, NUMBER TWO, SUMMER, 1950 


The ‘Theory and Classification of Criterion Bias. Hubert E. Broo- 

den and Erwin K. Taylor. 

An Investigation of Two Hypotheses Regar ding the Nature of the Spatial- 
Relations and Visualization Factors. William B. Michael, 

Wayne S. Zimmerman and J. P. Guilford. 187 

On the Use of Interactions as “Error Terms" in the Analysis of Variance. 

Allen L. Edwards. 214 

The Objective Measurement of Dynamic Traits. R. B. Cattell, A. B. 

Heist, P, A, Heist and R. G. Stewart. 224 

The Construction and Validation of a Work-Type Auditory Comprehen¬ 
sion Reading Test. Georoe Spache. 349 

Validation and Standardization of the AGO General Mechanical Apti¬ 
tudes Test for the Selection of Civilian Employees in War Department 

Installations. Adam Poruben, Jr. 2,54 

Three Aids in the Evaluation of the Significance of the Difference Be¬ 
tween Percentages. C. H. Lawshe and P. C. Baker. 263 

A Study of Faking on the Kuder Preference Record. Orrin H, Cross . 271 
Psychological Testing for Immigrants in a Vocational Counseling 

Agency. Benjamin Balinsky. 278 

An Investigation of the Personality Traits of Art Students. Martin 

Spiaggia. 285 

The Knowledge of General Education of a Sample of Syracuse University 
Students as Revealed by the Cooperative General Culture Test and the 
Time Magazine Current Affairs Test. N. M. Downie, M. E. 

Troyer and C. R. Pace. 294 

The Full-Range Picture Vocabulary Test: II. Selection of Items for 

Final Scales. Robert B. Ammons and Leo D. Rachiele. 307 

Roes Face Validity Exist? Sidney Adams. 320 

Administration of the Purdue Pegboard Test to Blind Individuals. 

James W, Curtis. 319 

Evaluating Psychometric Proficiency. Frank M. DuMas . 33a 

Interest and Personality Measures of Veteran and Non-Veteran Univer¬ 
sity Freshman Men. Katherine K. Fassett. 338 

Award in Student Personnel Research. C. Gilbert Wrenn. 342 

Sfitck Estimation of Multiple R. William Leroy Jenkins. 346 




















THE THEORY AND CLASSIFICATION OF 
CRITERION BIAS 


HUBERT E, BROGDEN 
and 

ERWIN K. TAYLOR 
Personnel Research Section, AGO* 

Introduction 

In that area of psychology concerned with the development 
of tests and other predictive instruments, psychologists have 
continually emphasized the need for validation. This insistence 
is sufficiently pronounced to serve as a trade mark of profes¬ 
sional psychologists. It is consistent with this insistence upon 
validation, that the importance of the criterion problem ha9 
been widely recognized. This is particularly true of the many 
psychologists connected with the various testing programs con¬ 
ducted during World War II. However, little attention and less 
effort have been devoted to a systematic consideration of the 
problems involved in criterion construction. Publications by 
Bellows (i), Stuit (13), Toops (15), Vi teles (18) and Guilford 
(9) are among the few dealing particularly with these problems. 
Any systematic consideration of the problems involved in 
criterion construction inevitably leads to the problem of bias; 
to a consideration of the ways in which components which 
should properly be a part of the criterion are omitted; to the 
ways in which extraneous components are introduced; and to 
how distortion of weighting or of scale units occurs. 
yjjThis paper will attempt a systematic consideration of these 
problems. A classification of bias will be introduced and related 
to the steps involved in criterion construction. The more specific 
problems of bias encountered will then be discussed in relation 
to this classification system and in relation to the various types 
of criteria (i.e., production records, ratings, achievement tests, 
etc.). 


1 The opinions expressed are those of the authors and do not necessarily express the 
official views of the Department of the Army. 



l6o EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Before proceeding further we should like to discuss two points 
important to the authors’ general orientation in attacking cri¬ 
terion problems. The essence of the first point in question may¬ 
be stated as follows: In seeking to define criterion problems— 
particularly those of criterion bias—it must be recognized that 
the objective of criterion construction is subsidiary to that of 
selecting the most efficient battery of predictors. Prediction 
instruments are validated for the purpose of picking the best 
selection battery, assigning appropriate weight to each of its 
several components, and determining the effectiveness of the 
battery, The criterion achieves its sole function if it makes these 
objectives of validation possible. In the development of an in¬ 
dustrial selection program, for example, the criterion should 
give an accurate and unbiased measure of the extent to which 
individuals in the validation population contribute to or detract 
from the efficiency of the organization. This may be taken as 
axiomatic. If so, the emphasis in criterion construction must be 
in terms of the objectives of the prediction problem. 

Criteria differ from predictors in that the former must be 
tested in terms of a concept that we carefully avoid in the latter. 
In constructing or choosing from among existing predictors, an 
empirical approach can be, and often is, profitably used. Re¬ 
course to previous research results, information based on job 
analysis, hunches, hypotheses, and intelligent guesses all pro¬ 
vided legitimate bases upon which to predicate a potential 
selection battery. Wrong guesses can be costly in terms of 
wasted research resources, but they are not misleading since 
they are put to the empirical test of how well each accomplishes 
the objectives of the prediction task, i.e., how well each corre¬ 
lates with the criterion. 

The criterion, by contrast, can be subjected to no wholly 
satisfactory empirical test of its adequacy. The criterion must, 
consequently, be logically justifiable as valid in its own right. 
The remainder of this paper is predicated on the acceptance of 
this point of view. Invalid and biased criteria, again in contrast 
to predictors, cannot be eliminated through empirical demon¬ 
stration of their inadequacy. Tims, the faulty criterion not only 
wastes research efforts, but seriously reduces the effectiveness 
of the final outcome of the program. 



CRITERION BIAS 


161 


For the purpose of this discussion, a biasing factor may be 
defined as any variable, except errors of measurement and 
sampling error, producing a deviation of obtained criterion 
scores from a hypothetical “true’' criterion score. It is apparent 
that this definition is quite general and leads to the considera¬ 
tion of all factors which bear upon the desirability or undesir¬ 
ability of criterion elements and their combination. Of course, 
the practical consideration which faces the research worker in a 
“real'’ situation precludes the complete elimination of all 
undesirable aspects of criterion construction. Perfection may 
be approached—it is not likely to be achieved. Nonetheless, 
to improve his criteria to the point optimal for the conditions 
under which he is working, the research psychologist must know 
the importance of different types of bias, the manner in which 
each will probably affect his results, the proper emphasis to 
place upon the elimination of those factors producing a distor¬ 
tion of results of indeterminate magnitude, and, finally, the 
probable effect of bias that cannot be entirely eliminated. It 
will be shown that different types of biasing factors vary widely 
in their distortive effect, generally as a function of the degree of 
their correlation with the members of the predictive battery. 
Some biasing factors influence the validity coefficients but have 
little or no effect on estimates of criterion reliability. Others 
affect both. Still others may alter the apparent reliability of the 
criterion without seriously influencing the validity. 

Classification of Biasing Factors 

Imperfections or bias in the criteria may be classified as: 

(1) Criterion Deficiency —omission of pertinent elements from 
the criterion. 

(2) Criterion Contamination —introducing extraneous ele¬ 
ments into the criterion. 

(3) Criterion Scale Unit Bias —inequality of scale units in 
the criterion. 

(4) Criterion Distortion —improper weighting in combining 
criterion elements. 

The above classification of criterion bias is functional in terms 
of the steps the authors consider essential to adequate criterion 
construction. These steps may be indicated as follows: 



1 6a EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(i) Careful analysis of the total situation in which the cri¬ 
terion behavior occurs for the purpose of isolating all 
sub-criterion variables and obtaining preliminary esti¬ 
mates of their relative importance—the determination 
of what is to be measured. 

(a) The construction of procedures and/or scales for the 
measurement of these elements—determination of how 
each element is to be measured. 

(3) Development of a procedure for combining these ele¬ 
ments into the desired single composite—determination 
of the relative importance of each element to over-all 
efficiency. 

Criterion deficiency is most apt to occur in the process of de¬ 
termining the variables to be included in the criterion. Con¬ 
tamination and criterion scale-unit bias are most likely to appear 
ill the process of constructing scales for the measurement of the 
sub-criterion elements while criterion distortion results primarily 
from faulty methods of combining the criterion elements. 

Each of the three steps of criterion construction is necessarily 
involved, however sketchily, in the development of any cri¬ 
terion. The rationale of our classification of bias is so intimately 
related to the belief in the need for an explicit plan of construc¬ 
tion involving these three steps as to justify further clarification 
of the implications of each in its relation to bias. 

The desirability of establishing the variables important to 
“success" by observation and job analysis (step 1) before pro¬ 
ceeding to scale construction (step a) and the combination of 
sub-criterion variables (step 3) deserves special emphasis, From 
reports of validation studies found in the literature, it may be 
judged that the usual first step in criterion development is the 
search for available criterion measures. The psychologist em¬ 
ploying this procedure very often arrives at a decision as to 
criterion content that is undesirably influenced by factors of 
availability. The discovery of several already available or 
readily obtained measures that are apparently suitable is in¬ 
clined to lead to neglect of the systematic observation and 
analysis necessary to insure that all important aspects of on-the- 
job productivity have been identified. In choosing criteria on 
the basis of availability, method of measurement as well as 



CRITERION BIAS 


163 

nature of variables usually is also a function of convenience 
rather than of desirability. Without accomplishing step 1 before 
deciding upon the means by which the criterion variables are 
to be measured, a systematic consideration of alternate methods 
of scale construction or measurement and choice of the optimal 
method for each criterion variable is not likely to be made. 
While it is recognized that, in many cases, the final decision as 
to the method of measurement will have to be made in the light 
of economy and available research resources, it is the firm belief 
of the authors that there is generally enough freedom of choice 
within the limitations imposed by even a policy of strict expe¬ 
diency, to justify the type of analysis proposed. At least the 
decision can be made with full and explicit recognition of the 
basis for making it. It might be added, parenthetically, the 
careful accomplishment of step 1, in addition to insuring that 
adequacy of criterion variables, frequently serves the additional 
function of supplying valuable clues as to possible predictors. 
Savings realized through this means may in part, if not entirely, 
offset the extra cost and effort required to make a thorough 
observation and analysis. 

Criterion Bias and Predictor Correlation 

To this point, our classification and discussion of bias have 
been in terms of the criterion alone. Since effort expended in 
constructing a bias-free criterion is, as we have stressed before, 
directed ultimately toward the proper choice and weighting of a 
battery of predictors, it is essential to consider the effect of 
criterion bias on the degree to which this objective is realized. 

Biasing factors correlating with the predictors will obviously 
distort the validities and the partial regression weights of the 
various predictors. They may even result in the inclusion of 
tests in the battery that predict only bias and have no relation¬ 
ship to the “true” criterion. The introduction of bias having no 
relation to the predictors is, on the other hand, equivalent, in 
effect, to an increase in the error of measurement of the cri¬ 
terion. The relationship of all predictors to the criterion will be 
attenuated, But this attenuation will be proportional for all 
predictors. Consequently, the relative magnitude of the validi¬ 
ties and the partial regression coefficients will be unaffected.. 



164 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

This leads to the highly important conclusion that the “ true” 
validity of the weighted composite resulting from the validation 
study remains substantially unaffected by test-free bias, even 
though the exact magnitude of this validity cannot be esti¬ 
mated. With these considerations in mind, we may further 
classify biasing factors into those which are predictor correlated 
and those which are predictor free. 

The authors do not wish to imply that the attenuating effect 
of test-free bias is of little import. In addition to the attenuation 
of the validity coefficients and partial regression weights, two 
other undesirable results will accrue from the introduction of 
test-free bias into the criterion: (1) The sampling error of the 
validity and regression weights will tend to increase, thus 
rendering these statistics less stable from sample to sample, and 
(2) biasing factors that are test free may, none the less, distort 
estimates of the reliability of the criterion in an indeterminate 
manner. 

The first of these faults may be overcome by increasing the 
size of the experimental population if additional cases are avail¬ 
able with, of course, a resulting increase in the cost of the re¬ 
search. The problem of correcting for the unknown effect of 
test-free bias on criterion reliability is more difficult, and 
possible solutions are usually less satisfactory. Such possible 
solutions are, in any event, particular to the nature of the bias¬ 
ing factors. 

In spite of these adverse effects of test-free bias, it is believed 
that, effectively, it is the presence or absence of test-correlated 
bias that “makes” or “breaks” the criterion. 

Criterion Deficiency 

Before beginning our discussion of criterion deficiency, a dis¬ 
tinction should be made between criteria designed to measure 
over-all proficiency on a particular job and those concerned 
with success in specific job elements. The validation problems 
involved are both legitimate. In the latter case, it may be 
desired to measure success in a job element common to a wide 
variety of job classifications in order to validate a test designed 
specifically to predict this element. Adequate validation sam¬ 
ples can sometimes be obtained only by combining groups from 



CRITERION BIAS 


I6 5 


a wide variety of jobs, all of which share the concerned element. 
The problem of criterion deficiency would not usually be perti¬ 
nent to validation studies of this nature. Our concern, in any 
event, will be exclusively with criterion deficiency as it occurs 
in criteria of general on-the-job success. 

Criterion deficiency is present to a greater or less degree in all 
studies involving criteria of general success. While it is doubtful 
that a criterion could be built which would take into account 
all aspects of on-the-job performance, it is the authors' opinion 
that the high incidence of deficiency may be avoided by a more 
systematic approach to the problem of determining criterion 
elements. In the light of our earlier discussion of the relation¬ 
ship between biasing factors and the steps essential to criterion 
construction, it is apparent that it is in step 1—the analysis of 
the situation in which the criterion behavior occurs—that cri¬ 
terion deficiency is most likely to materialize. Adaptation of the 
principles of worker analysis can probably be made so as to 
minimize criterion deficiency in prediction problems. 

The systematic investigation of the situation in which the 
criterion behavior occurs serves several valuable functions. 
First, it minimizes the possibility of overlooking important 
criterion elements. Second, it supplies the investigator with 
valuable clues as to the most practical means of measuring the 
several criterion elements. Third, the analysis supplies some 
initial estimates of the relative importance of the several cri¬ 
terion elements. Thus, if available facilities require limitation 
of the criterion to a bare minimum, an intelligent judgment may 
be made as to which elements may be omitted from the study 
with least harm. Finally, an analysis of the criterion situation in 
advance of any other steps in the study will generally shed con¬ 
siderable light on the nature of the predictors most likely to be 
valid. This can eliminate considerable loss of valuable testing 
time and may result in batteries of greater validity than would 
usually be the case with predictors chosen on a less sound basis. 

The ‘'critical incident” technique for the construction of 
rating scales as expounded by Flanagan (7) appears to offer 
promise as a means of reducing criterion deficiency in rating. 
Not enough is yet known concerning the use of the method to 
permit a considered judgment of its value for this purpose. 



166 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


One factor frequently making for criterion deficiency is the 
inclination of investigators to employ only one type of criterion 
measure. Studies using ratings, usually use only ratings; those 
in which production records are used, use only production 
records; where job samples are employed, neither ratings nor 
production records are likely to enter into the picture. If an 
adequate analysis of the job situation were accomplished and a 
decision as to criterion content were made before consideration 
is given to the most desirable measuring techniques for each 
job element, it would seem that production records would often 
be found most desirable for some of the criterion elements and 
ratings or job samples most desirable for other elements. 

Composite criteria consisting of a variety of production in¬ 
dexes seem, in practice, to be most frequently and most obvi¬ 
ously subject to criterion deficiency. The difficulties involved 
in devising and putting into operation the procedures necessary 
to obtain production records for those job elements for which 
none already exist often constitute the determining factor in 
such instances of criterion deficiency. A systematic approach 
to criterion construction will do much to minimize such bias, 
If the important job elements influencing over-all efficiency are 
isolated first of all, gaps in the total job picture become more 
readily apparent and measures may be obtained of those ele¬ 
ments necessary to complete the criterion composite in the 
manner that is most practical in the particular situation. If it is 
found at that time that production records cannot be made 
available for the measurement of all criterion elements; ratings, 
job samples, or other means may be devised to eliminate the 
gaps in the composite. 

In considering criterion deficiency in relation to rating 
criteria, we must distinguish between over-all ratings and com¬ 
posites derived from separate evaluations for each element. In 
the latter case, there is the same need for systematic analysis of 
the job situation for the determination of the elements to be 
evaluated as in the construction of production record or mixed 
criteria. Generally, rating criteria, whether as separate element 
ratings or as over-all, undertake to account for a larger part of 
the total job than is the case with production criteria. Thus, 
criterion deficiency is probably somewhat less prominent in 



CRITERION BIAS 


167 


ratings than in ordinary production record criteria. Bias un¬ 
doubtedly does occur because of improper weighting. It should 
be pointed out that so little weight is given to some factors that 
the criterion distortion introduced practically amounts to 
criterion deficiency. 

It should be recognized that in the use of over-all ratings of 
effectiveness, the problem of criterion deficiency has not been 
solved. Rather, it has been placed in the laps of the raters. The 
extent to which such rating will be deficient depends, of course, 
upon the extent to which each of the raters has included each 
of the important elements of success in making his rating. It 
may be expected that different raters will incorporate different 
elements into their composites and that, in effect, there will 
be a different amount and kind of criterion deficiency in the 
estimates obtained from different raters, if not in different rat¬ 
ings made by a single rater. When limitations of the research 
study require the use of over-all ratings as the criterion, it 
would seem advisable to incorporate a careful definition of the 
important job elements in the directions for the execution of the 
ratings. This, if properly accomplished, should help to reduce the 
extent of criterion bias and to insure that the evaluations of the 
several raters are predicated on a more uniform constellation of 
elements than would otherwise be the case. 

The foregoing comments appear to provide sufficient con¬ 
sideration of criterion deficiency in relation to ratings. Because 
of the effect of halo (discussed below), it is difficult to consider 
this problem intelligently. Ratings of different job elements are 
often found to be so highly interrelated that one suspects that 
the rater’s impression of the ratee’s competence is the only 
determining factor of general importance. Because of this 
effect, the authors do not wish to give the impression that ad¬ 
herence to the foregoing suggestions will produce substantial 
improvement in the results obtained. 

An examination of research reports indicates that, in general, 
systematic job analysis is an initial step in the construction of 
job-sample criteria more often than in the construction of any 
other type of criterion measures. In spite of such systematic 
job analysis, it is the authors' opinion that important elements 
of on-the-job success are usually omitted from job-sample 



1 68 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


criteria. Much of the difficulty in this respect arises because the 
job-sample criterion indicates how well the employee can 
perform under standard conditions rather than how well he 
does perform under normal work-a-day conditions. It could 
possibly be argued that for the validation of aptitude and 
achievement tests, as opposed to personality measures, this is 
precisely what is desired. Where on-the-job success is a function 
of personality variates, however, job-sample criteria are apt to 
be criterion deficient. As a result of the exclusive use of such 
criteria, truly valid measures of personality differences would 
be excluded from the battery selected for operating use. Thus, 
while the use of job-sample criteria may be recommended for the 
evaluation of production in certain types of situations, it is 
doubted that they should ever be used alone as a measure of 
over-all on-the-job success. 

Criterion Contamination 

Criterion construction based on arm-chair considerations o r 
factors of availability, rather than on an analysis of the job 
situation, faces not only the danger of omitting important fac¬ 
tors but also that of incorporating variables that are not meas¬ 
ures of on-the-job success. While contaminants of the criterion 
occur in the process of deciding what to measure, it is in the 
construction of the actual scales, or other means of measure¬ 
ment, that the investigator most frequently faces the problem 
of contamination. 

From our outline of the steps in criterion construction it will 
be noted that procedures and/or instruments for making such 
measurements must be devised as a second step following the 
determination of the job elements in need of measurements. 
In discussing contamination in relation to the major types of 
criterion measures, the broader meaning of the term as em¬ 
ployed here should be borne in mind. The more conventional 
usage of the term limits it to contamination introduced by di¬ 
rect influence of predictor scores on the criterion. The basic 
example is the effect of knowledge of predictor scores on cri¬ 
terion ratings. Bellows (i) extended the meaning of the term to 
include such phenomena as opportunity bias and artificial 
restriction of production. In the present paper, as has previously 



CRITERION BIAS 


169 


been noted, any source of variance in the criterion, other than 
error of measurement that is not a reflection of on-the-job suc¬ 
cess, is labelled “criterion contamination.” Thus, our definition 
includes all extraneous elements in the criterion. However, 
several additional concepts will be introduced to aid in dis¬ 
tinguishing between different types of contamination. 

In production records, contamination most frequently oc¬ 
curs because factors beyond the control of individual workers 
considerably affect the amount of his production. This type of 
contamination has been referred to as opportunity bias, 

Examples of opportunity bias may be cited for almost any 
type of job. In evaluating salesmen such bias may occur be¬ 
cause of differences in the “goodness” of territory; in evaluating 
production line workers, it may occur because of differences 
between day and night shifts, in the location of the work site, in 
tools and machines, in the efficiency of supervisors, or in work¬ 
mates and repairmen. Differences between day- and night- 
shift workers may be substantial even though no differences 
exist as to potential productivity. If samples of production are 
obtained at different times for different workers, diurnal varia¬ 
tions in productivity may bias the obtained criterion scores. 
Thus, it is known that work output definitely varies according 
to the time of day. Hence, records of production obtained on 
individuals at the time of optimal output would be biased in 
relation to those obtained at the time of minimal output. A 
comprehensive listing of the sources for opportunity bias is 
impossible. A careful analysis of the conditions of work of the 
various members of the experimental group during the collec¬ 
tion of criterion measures is necessary to insure identification of 
such biasing factors. 

The most important question to be answered with reference 
to opportunity bias is the degree to which it is test correlated or 
test free. First of all, the possibility that components of the 
experimental predictor battery were employed in determining 
who would be placed in the position where opportunity for 
high production record was greatest, should be checked. For 
example, if tests or other variables in the predictor battery were 
employed to determine which salesman obtained the best terri¬ 
tory or which sales clerk was given the best counter, etc.,—as 



170 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


might often occur if the selection procedures being validated 
were actually used in the operating selection program—the 
effect on validity of such biasing factors could be very con¬ 
siderable, 

Suppose that 10 per cent of the variation in amount of pro¬ 
duction in a given job were due to some form of opportunity 
bias, and that placement in a position of greater opportunity- 
had been in terms of a test employed for the prediction prior to 
the initiation of a research study, If, in this research study, this 
predictor was evaluated along with other experimentally con¬ 
structed instruments, we can compute that the obtained va¬ 
lidity (biased by the opportunity factor) would be .31, even 
though its actual validity were zero. It may be seen that the 
resulting contamination would be highly destructive to the 
objectives of the research study, 

Even if no direct evidence of relationship is found between 
any predictor and opportunity bias in criterion scores, evidence 
of indirect relations should be sought. If seniority were to 
determine placement in the position of greatest opportunity, 
predictor variables such as age and experience would show 
heavily biased validity. Personal history items, bearing directly 
or indirectly on the length of experience or age, would have 
similarly biased validities, Other possibilities may be cited. 
Questionnaire items relating to marital status may appear to 
have high validity because a much higher percentage of non- 
married workers choose to work on the night shift. A measure of 
aggressiveness may falsely appear to be a valid predictor of 
sales records because the more aggressive salesman pushes him¬ 
self into the advantageous sales-territories. 

While the possibility that opportunity bias may be test- 
correlated should be thoroughly checked, it is probably gen¬ 
erally true that the extent of the correlation will frequently be 
found to be negligible. Generally, in other words, opportunity 
bias will be test free and will attenuate or lower all validity 
coefficients but will not seriously distort their relative mag¬ 
nitude. 

A second frequently mentioned contaminating factor in pro¬ 
duction records is the one introduced by limitations on rate of 
production. Such limitations may occur because of assembly line 



CRITERION BIAS 


171 

production, because men work in teams, because of social pres¬ 
sure from other workers or from a number of similarly operating 
factors. These are not biasing in one sense of the term. If pro¬ 
duction of the faster workers cannot exceed that of the slower 
workers by more than 50 per cent, the observed difference is 
truly representative of the full advantage to be obtained by 
hiring the fastest in preference to the slowest worker for that 
given job situation. Of course, if the effect of a change in the 
composition of the efficiency of all members of the assembly 
line—or group—could be measured, the problem would be con¬ 
siderably changed. In order to obtain a measure of such effects 
it would be necessary to depart from the usual correlational 
methods of validating tests. It would be necessary to select 
groups with differing average productivity, to assign all mem¬ 
bers of each group to a given assembly line and to compare the 
mean productivity of these groups. In such comparison of 
groups) experimental controls would have to be established; 
that is, the conditions of work, and all factors influencing out¬ 
put, would need to be equalized for all groups, with variation 
between groups limited to the difference in predicted produc¬ 
tivity. While the method for handling this special problem 
bears mention, extended discussion is not possible at this point. 

While the effect of such factors is not contaminating in the 
sense indicated above, results due to the presence of such factors 
cannot, of course, be generalized to situations where such limi¬ 
tations are not present. The most obvious conclusion to be 
drawn when limitation on production is discovered, is that 
selection programs are likely to be of limited value. 

To save time and money, a large proportion of the industrial 
selection researches are conducted on in-service personnel, i.e., 
tests are administered to and criterion data are collected on 
personnel already in the employ of the sponsor. Such cross- 
sectional studies, while nesessary, are always, to some degree, 
defective in experimental design. In practice, test scores are 
necessarily obtained prior to employment or to any other per¬ 
sonnel action based on them. The conduct of cross-sectional 
studies may introduce two types of contamination when its 
results are applied to the employment situation. The first is a 
test contamination arising from the fact that both the on-the- 



17 1 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

job experience, and the nature of the conditions under which 
the predictors are administered, may exercise considerable 
influence on test scores. This, being a predictor rather than 
criterion contamination, need not further concern us here. 

The collection of criterion data on an in-service population 
may, however, introduce an experience-contamination which is 
of direct concern to us. Where the job is one in which produc¬ 
tion may be expected to rise with increased experience and there 
is considerable variability in the tenure of the validation popu¬ 
lation, the criterion will, of course, be contaminated with 
experience. If the predictors also include experience-correlated 
variables such as age, the contamination will be predictor corre¬ 
lated. If the tests are also experience-contaminated, such tests 
will show a spuriously high correlation with the criterion. 

Validities of predictors such as information or proficiency 
tests, and knowledge of terminology, would tend to show posi¬ 
tive bias in validity in cross-sectional validation studies. 
Knowledge of terminology and productivity would both tend 
to be greater in experienced than in inexperienced workers 
even though there might be no relation between the two 
measures among workers with equal experience, Bias of this 
nature may be avoided by testing prior to employment, by 
administering all tests to groups with constant amounts of 
experience, or by controlling experience statistically. The dan¬ 
ger of such bias does not have bearing, obviously, on the 
utilization of experience prior to employment for the given job 
as a predictor. 

Estimates of the reliability of production criteria are prob¬ 
ably more often, and more seriously, distorted by biasing fac¬ 
tors than are validities. Bellows (i) has pointed out that in 
many jobs where unequal opportunity seriously affects the 
production records of a category of workers, it is likely that a 
second measurement of the productivity of these workers will 
be obtained with the same biasing factors in operation and with 
the same workers showing spuriously high productivity, For 
example, if production records were obtained during two differ¬ 
ent intervals on a population of workers including those on day 
and night shifts, it would probably be found that day-shift 
workers produced more during both time intervals. The appar : 



CRITERION' BIAS 


173 


ent reliability of the production measure would be quite high 
even though its actual reliability were below usual standards of 
acceptability. 

The construction of rating scales free of contamination 
presents, possibly, more serious problems than detection and 
elimination of contamination from production records. It 
should be stressed initially that all of the sources of bias dis¬ 
cussed in connection with production records will probably also 
tend to influence ratings of productivity. It is possible, however, 
that raters may be successful in making allowance for some of 
these factors—opportunity biases, for example—and thus 
reduce their influence. 

The most obvious and probably the most serious source of 
contamination peculiar to ratings arises because of the so-called 
halo effect. 

The term “halo” implies that a spurious relationship between 
rated traits, attributed to a spread of the effect of the raters’ 
attitude toward, or estimate of, the rater in one dimension 
over to his attitude toward or estimate of the rater in other, 
unrelated, dimensions. Various factors have been postulated as 
the source of the halo effect. Degree of personal liking is fre¬ 
quently mentioned as a possible source, Over-all impression, 
social prestige and outstanding achievement in a particular 
field are other possible sources. As yet there is no evidence allow¬ 
ing definite conclusions regarding the source of the halo effect. 
It may be regarded as established, however, that some factor 
or factors operate spuriously to increase the relationship be¬ 
tween ratings on different characteristics. 

Since the source of halo cannot be established, it cannot be 
regarded as necessarily a contaminating factor. Bingham (a), 
in discussing the role of halo in criterion ratings, expresses the 
belief that there are a number of situations in which the general 
impression that the individual makes on those he comes into 
contact with, can itself be an important criterion element. He 
concludes that halo should not, in all cases, be considered an 
undesirable attribute of criterion ratings. 

It would be agreed by most, however, that Bingham’s con¬ 
clusion, even if correct, gives no sound solution to the problem 
of halo in criterion ratings, Even though halo reflects important 



174 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

elements of on-the-job proficiency, it would be desirable to ob¬ 
tain adequate estimates of proficiency in the various aspects of 
the job, free of halo, in order to insure that separate job ele¬ 
ments are properly weighted in arriving at an over-all composite. 

Halo effect, if contaminating in nature, can become test 
correlated and thus assume considerable importance, particu¬ 
larly when the prediction battery includes ratings, personality 
measures and ability tests. In such a situation, the criterion 
and predictor ratings may show spuriously high correlation 
because of halo effect common to both. Personality measures 
may likewise show spuriously high validities through the pre¬ 
diction of the contaminating halo element. Since, at the same 
time, validities of ability-test scores would probably be attenu¬ 
ated, the partial regression weights for the entire battery would 
be considerably distorted. The tendency reported by Bingham 
and Freyd (3) for personality measures to show relatively higher 
validities against rating criteria and for objective tests to show 
relatively higher validities against production record criteria, 
may be explained, at least in part, by the biasing effect on the 
validities of personality measures noted above. It is particu¬ 
larly important to note that direct criterion contamination 
may result from a remote source, Variables which influence 
criterion ratings need not be members of the prediction battery 
in order to distort the validities and regression weights. If the 
variables which influence the criterion scores are correlated 
with any in the battery, the resultant criterion contamination 
will be test correlated; if such variables are uncorrelated with 
members of the predictor battery, the contamination will be 
predictor free. 

A source of criterion contamination in ratings similar in its 
effect to opportunity bias arises from differences in the mean 
values obtained from different raters. Employees of a tough 
rater will receive lower criterion scores than will those of an 
easy rater. Normally, the resulting contamination will be test 
free, However, if assignment to various supervisors is made on 
the basis of test scores, such bias can be predictor correlated. 
Conrad (4) has contended that such differences in rater tend¬ 
ency are over-emphasized and that proper rating techniques 
will minimize such differences. 



CRITERION BIAS 


175 


A basic source of contamination in ratings arises from the 
failure of raters or of rating-scale constructors to distinguish 
between those observations which constitute direct evidence 
of productivity and those which give only inferential evidence 
of productivity. To this source of criterion contamination the 
authors would like to give the name “error of illation,” Thus, 
ratings on the efficiency of a carpenter based on observations 
on the skill with which he uses his hands, the air of assurance 
with which he handles tools or even the correctness of his choice 
of tools for each operation, are all inferential and without em¬ 
pirical evidence cannot be assumed to have high relationship to 
actual productivity. Even though such relationship were estab¬ 
lished, it could not be assumed that such trait ratings could be 
substituted for direct measures of productivity without biasing 
effect on the validation results. 

In designing forms that incorporate scales for the measure¬ 
ment of such traits as manual skill, industriousness and ambi¬ 
tion, the psychologist promotes this form of contamination. 
Ratings on such traits give rise to the danger that resulting 
evaluations may not only have been inferred, but that they 
may have been inferred, in large part, from events observed in 
a social situation or in other situations having no necessary 
relation to on-the-job productivity. Evidence on the highly 
specific nature of psychological traits from studies by Hart- 
shorne and May (10) are pertinent in showing the dangers of 
such bias. 

The tendency of rater to consider the symptoms of produc¬ 
tivity rather than productivity itself can probably never be 
entirely eliminated. It should be possible in many work situa¬ 
tions, however, to identify the individuals with the greatest 
opportunity to observe the actual production element and to 
orient the scales so that the evaluations given by the rater 
involve as few deductions and as much direct observation as 
possible. The directions and content of the scales can be so 
oriented that they specifically request the rater to base his 
evaluation on direct observation of results. Even though it is 
improbable that the desired purpose will be entirely accom¬ 
plished, the technician should at least not be guilty of encourag¬ 
ing a tendency toward inference rather than direct observation. 



176 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


by phrasing his directions and scales in terms of indirect or 
inferential content. 

Having constructed scales oriented toward direct evidence 
of productivity and having determined who is in the best posi¬ 
tion to evaluate each directly, the technician may take one 
additional step to help reduce errors of illation. The raters 
may be instructed, well in advance of the collection of the cri¬ 
terion ratings to observe and to take note of behaviors falling 
in the areas to be rated. Mention should be made of the fact 
that such oriented observation is an integral part of the “criti¬ 
cal incident” technique mentioned above. 

The bias introduced by the illation error is probably very 
often test correlated. Trait ratings obtained prior to employ¬ 
ment might give excellent prediction of ratings of traits thought 
desirable for efficient performance but not actually related to 
quantity and quality of production. If so, nothing will have been 
demonstrated. Personality tests related to the traits thought 
desirable would similarly yield inflated validity coefficients. 

The danger of contamination in use of achievement-test 
scores, as criteria of success in training or in school, are 
considerable. Probably, also, such contamination will be test 
correlated. Frequently, information tests are employed along 
with aptitude and other measures to predict achievement in 
training. Such achievement is also measured by an informa¬ 
tion test administered at the end of training. Test constructors 
working on both the predictor- and criterion-information tests 
may well employ the same source material for constructing 
items and may well both err in the same direction in selecting 
items irrelevant to or unimportant in the actual training proc¬ 
ess, Such common but irrelevant content in the predictors and 
criterion can naturally be expected to produce test-correlated 
contamination. 

It is probable that a similar biasing effect is often obtained 
in relating any ability-test measures to success in training. 
Generally speaking, ability-test scores have shown uniformly 
high validities in this area. Such validities are suspect, however, 
since they are obtained by relating initial test scores to measures 
of proficiency after training. Woodrow (17) has shown that 
initial test scores show little relation to improvement with 
practice. He has also shown that general-intelligence-test scores 



CRITERION BIAS 


177 


(often interpreted as measures of learning ability) have little 
if any relation to improvement in scholastic achievement. Since 
the essential problem is the prediction of benefit derived from 
training, lack of evidence contradictory to that reported by 
Woodrow suggests that predictors of training success or train¬ 
ing improvement have doubtful validity for that purpose. The 
selection instruments may, of course, still have value in pre¬ 
dicting on-the-job success. To assume such validity, knowing 
only that the predictors correlate with estimates of achievement 
in training, assumes that achievement in training is highly 
related to on-the-job success. Little, if any, research has been 
reported demonstrating a positive relationship between training 
success and later success on-the-job. The low correlation of the 
academic achievement of West Point Cadets (8) with later 
success as Army officers, argues strongly that training success 
cannot be assumed to have appreciable relationship to success 
on-the-job. 

Job-sample criteria are possibly less subject to contamination 
than any of the criteria discussed. Opportunity can be carefully 
controlled. Halo, effect of easy-hard raters, etc., can be reduced 
to a minimum. Ratings of work products, while subjective, 
differ in character from ratings of individuals. In rating work 
products, raters need not know the individuals whose products 
are being evaluated and the effect of personal likes and dislikes 
of the rater can thus be eliminated. 

However, because of the similarity between the test-like 
character of the situation under which the job-sample meas¬ 
ures are obtained, and the usual conditions under which tests 
in a predictor battery are administered, contamination, test- 
correlated in nature, is probably often present in job-sample 
criteria. Individuals who become overexcited or nervous in the 
one situation may tend to show the same type of behavior in the 
second. Similarly, individuals who put forth greater effort when 
being watched, would be apt to do so in the type of situation in 
which both tests are administered and job-sample measures are 
obtained. It is possible, also, that if tests and job-sample per¬ 
formances are obtained on the same day, factors peculiar to the 
day of testing will act as test-correlated contamination and 
introduce a positive bias into the validity coefficients. 

A type of criterion scale in which the possibility of contamina- 



178 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

tion is easily overlooked is that in which "high” and "low” 
groups are employed, For some not-too-apparent reason, in¬ 
vestigators seem to feel that by selecting extreme groups they 
have circumvented the problem of contamination and have 
secured “pure” cases, It is strongly emphasized that the selec¬ 
tion of such groups is based on a continuum either actual or 
implicit, Extremes on this continuum may be extreme on a 
contaminating element as well as on the “true score” com¬ 
ponent of this continuum. Such measures should be as carefully 
scrutinized for contamination as any continuous criterion. 

Investigators often show a similar tendency to neglect prob¬ 
lems of bias in using a group-membership criterion. In the 
situation in which members of one occupation are compared 
with members of other occupational groups, or with the 
general population, the opportunity for criterion contamination 
is extensive. Where the "in” group has been test selected and 
the same or correlated predictors are included in the experi¬ 
mental battery, presence of extensive test-correlated contamina¬ 
tion is almost certain. 

Even where prejudices, rather than tests, dictated entrance 
into the occupational group, predictor-correlated contamination 
may be expected. If an executive, for example, arbitrarily ruled 
that all messengers coming into the firm should be high-school 
graduates, and employed messengers were compared with some 
general group, education and educational achievement tests 
would show substantial validity even though their true validity 
were negligible. 

The composition of occupational groups is determined by 
factors determining the initial choice of occupation and by 
attrition after such initial choice. Factors responsible for choice 
of occupation are almost certainly a source of contamination; 
those responsible for attrition may be of value for criterion 
purposes, It is not usually possible to obtain any reasonably 
exact information concerning the major factors in either case. 
Because of this lack of information, if for no other reason, such 
a criterion is suspect. 

The preceding discussion of contamination could not be 
completely comprehensive. In any individual research study, 
contamination peculiar to that study may be discovered. We 



CRITERION BIAS 


179 

have endeavored, however, to clarify and illustrate the nature 
and effect of the more important general factors. 

Criterion Scale Unit Bias 

While the presence of scale-unit bias in criteria has fre¬ 
quently been recognized, particularly in connection with rat¬ 
ings, the general problem has not been extensively explored. 
A review of the psychological literature provides little evidence 
allowing an estimate of the prevalence or seriousness of scale- 
unit bias in the criteria of validation studies. 

Basically, it is believed that the problem centers in the ab¬ 
sence of an adequate rationale. There is no generally accepted 
means of judging the presence or absence of scale-unit bias 
available to the investigator desirous of evaluating the relative 
merits of various possible types of scale units or scaling pro¬ 
cedures. 

Possibly the only widely used standard of adequacy of scale 
units is the degree of approximation of the obtained frequency 
distribution to a normal curve. While this standard may be of 
some value in avoiding serious distortion of scale units, it 
must be remembered that normality is always an assumption. 
Standards are needed that will allow checking the adequacy of 
scale units in a particular example without the necessity of such 
an assumption. From a logical view point, a standard forjudg¬ 
ing presence or absence of scale-unit bias that applies to the 
shape of the frequency distribution is in any event defective. 
The distribution form is a function of the population involved 
as well as of the scale units. Normality should certainlynot be 
considered desirable where there is strong presumptive evi¬ 
dence that selection of cases has occurred. 

It is fortunate that, in general, product-moment validity 
coefficients do not seem to be seriously affected by alteration of 
scale units so long as rank order is unchanged. When test scores 
are converted to normalized form, or when ratings obtained in 
rank-order form are normalized, product-moment validities are 
usually very little altered. 

While the product-moment validity for the entire range is 
probably little affected by scale-unit distortion validity, in¬ 
dexes computed for particular points of cut on the predictor may 



l8o EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


be seriously affected. Where scale-unit bias is suspected, such 
coefficients should be interpreted with caution. 

We might note also that a heavily skewed criterion distri¬ 
bution, if established as genuine, would have implications of 
some significance for efficient selection. Individuals on the tail 
of a skewed distribution could undoubtedly be identified with 
greater confidence than those in the same percentile point on a 
normal curve. Thus, while the problem of scale units may not 
be of great significance when conventional methods of analysis 
are employed, a solution to the problem that would allow identi¬ 
fication of highly skewed distributions with confidence could 
lead to improved efficiency of selection through different 
methods of analysis. 

From the criterion point of view, the scale-unit problem re¬ 
duces to one of establishing units which represent equal incre¬ 
ments in terms of the over-all efficiency of the organization. 
This point will be elaborated by the authors in a forthcoming 
paper on that topic. 

In terms of the efficiency of the organization, production 
records appear to be relatively free from scale-unit bias. An 
additional object produced has equal value whether it increases 
the productivity measure of an individual from i to i or from 
99 to ioo. A given error is just as costly no matter whether it 
increases the error score from 4 to 5 or from 19 to 10. Such units 
have meaning in their own right. Even in the evaluation of 
quality of production, differences in quality can be assigned 
values having direct meaning if the resulting objects of differing 
quality are eventually sold for different prices. Quality differ¬ 
ences would then acquire a quantitative monetary value. This 
cannot, however, always be accomplished. 

Ratings are subject to a number of forms of criterion scale- 
unit bias. Piling at the upper end of the scale, failure to employ 
the lower scale units, piling in the center of the scale and other 
defects have all been frequently reported in the literature. Since 
these tendencies appear in wide varieties of rating situations, it 
seems reasonably certain that they are distortions of the scale 
units and are not due to the nature of the true distribution of 
the degree of productivity in the job element being rated. 

Lack of information as to the true or proper distribution 



CRITERION BIAS 


181 

form considerably hampers the solution to the problem of scale- 
unit bias in ratings. While is seems reasonably certain that the 
scale-unit biases mentioned above do often occur, it is difficult 
to judge in any particular instance when a rating scale is free of 
scale-unit bias and, more particularly, the nature of such bias 
as may be present. 

In the absence of evidence to the contrary, a normal distribu¬ 
tion of criterion rating scales would usually indicate freedom 
from scale-unit bias. If the distribution of production records, 
on the job element being rated, is known from other research 
studies, such distributions would probably provide a sounder 
basis for judging the adequacy of the distribution form of the 
criterion ratings than would the normal curve. 

In the use of order-of-merit rankings it is apparent that the 
form of the distribution is forced and that equal numbers of 
individuals fall within each interval of a given magnitude. If, 
however, rankings are obtained from a number of different 
raters it will usually be found that the average of the rankings 
will approximate the normal curve to a satisfactory degree. 

No problems of scale-unit bias arise which are peculiar to 
job-sample criteria. If job-sample criteria are scorable in pro¬ 
duction units, comments made with reference to scale-unit bias 
in production units will apply here also. If scoring is subjective, 
problems similar to those encountered in rating scales will 
occur. It seems probable, however, that scale-unit bias will be 
less extreme than that occurring in direct evaluation of indi¬ 
viduals. The direct evaluation of production has the added ad¬ 
vantage that in some cases it can be divorced from the indi¬ 
vidual to some degree and thus escape, in part at least, some of 
the biases which stem from the interpersonal relations between 
rater and ratee. 

Achievement tests employed as criteria involve scale-unit 
biases of a nature peculiar to continuous variables obtained by 
summing a number of dichotomous items. Where rating scales 
are so constructed they will also be subject to this form of scale- 
unit bias, Variation in the difficulty level (i.e,, percentage of 
raters checking a given item) will have considerable effect upon 
the distribution form of the total score. The effect here is very 
similar to the effect of item-difficulty distribution on factor 



I go. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

structure of tests discussed in some detail by Ferguson (6) 
and Wherry and Gaylord (i 6). The frequency of occurrence in 
the population of a component element of a criterion scale is 
analogous, in other words, to the difficulty level of component 
items of a test insofar as the statistics of their interrelations are 
concerned. If a criterion consists of high difficulty elements, it 
will tend to correlate more highly with tests also consisting of 
high difficulty items and less highly with tests consisting of low 
difficulty items. 

When a criterion variable consists, then, of a number of dis¬ 
crete items, the investigator should take care to insure that the 
difficulty level or the “frequency of occurrence” level corre¬ 
sponds to the frequency of occurrence of the job element in 
the work situation. To accomplish this purpose, a difficulty 
distribution should probably be determined for each set of cri¬ 
terion components, and the number of observations or measures 
at each level should be made to adhere to this predetermined 
distribution. 

Criterion Distortion 

An additional source of bias, which we have referred to as 
“criterion distortion,” arises as a result of the improper assign¬ 
ment of weights to the several elements. More broadly defined, 
criterion distortion would include all of the other types of bias 
discussed. Thus, criterion deficiency is the assignment of weights 
of zero to elements that should in reality have non-zero weights. 
Criterion contamination is the opposite error; the assignment of 
non-zero weights to elements that merit no consideration. 
Criterion scale unit bias in effect assigns different weights to 
different parts of the continuum of the given criterion element. 

A number of techniques have been proposed for determining 
the proper weights for criterion elements. We may do well to 
examine several of the procedures that have been proposed and 
to investigate the type of situation in which each is most ap¬ 
propriate. 

Horst (n) and Edgerton and Kolbe (5) have proposed pro¬ 
cedures which, in effect, operate to maximize the reliability of 
the over-all criterion. The assumption implicit in these tech¬ 
niques is that all criterion elements measure the same basic 
variable and that the lack of perfect correlation between them 



CRITERION" BIAS 


is attributable to error of measurement. This procedure would 
thus be quite applicable to situations in which the criterion con¬ 
sisted of several measures of the same attribute, such as ratings 
by different observers of the same trait. It seems evident, how¬ 
ever, that this technique should never be employed in combin¬ 
ing elements which attempt to assay behavior on different con- 
tinua. Unfortunately, the technique has often been used for 
this latter purpose. It is the author’s opinion that the chief 
advantages of employing techniques developed by mathemati¬ 
cal derivation lie in the thorough and explicit manner in which 
the assumptions must be stated. If the assumptions used are 
ignored in applying the technique or formula developed, the 
mathematical development is, in a sense, disadvantageous in 
that it lends prestige to a formula completely unsuited to the 
particular application. 

Where no objective basis exists for the establishment of the 
relative weights of criterion elements, weights obtained by 
Toops’ (14) method of guessed Beta weigh ts is, in the authors’ 
opinion, superior to an unweighted raw or standard score sum. 
Toops proposes averaged estimates of the judged importance 
of the various criterion elements as a means of weighting, the 
judges being those personnel in the sponsoring agency having 
the best knowledge of the implications of various criterion 
elements for the efficiency of the organization as a whole. 
There are a number of technical problems involved in making 
clear to the judges the proper basis for guessed Betas. Consider, 
for example, the problem of obtaining weights for combining 
the number of production units and the number of errors. 
Should the evaluations requested be phrased so that raw-score 
weights are obtained or so that standard-score weights are 
obtained? Since the judges will probably not understand the 
effect of differences in the standard deviation of criterion ele¬ 
ments on their effective weighting, how can bias from this 
source be avoided? In spite of these problems in technique the 
method provides a direct approach to the basic problem of 
weighting criterion elements. In addition, its sponsor accept¬ 
ability should be high. These factors, in the author’s opinion, 
suggest the advisability of a more extensive use of this 
technique. 

It should be stressed that the common practice of computing 



t??4 educational and psychological measurement 


separate validity coefficients for the various subcriteria is 
equivalent, in the final analysis, to a method of combining 
criteria scores. It differs in that the experimenter avoids a 
formal procedure. Instead, he merely looks at the validities 
against the several criteria and decides on the tests which are 
to constitute the selection battery. Such a procedure has the 
effect of concealing from the research worker himself the fact 
that he is deciding the relative importance of the sub-criterion 
variables. Usually, the investigator will decide to include sev¬ 
eral tests for the prediction of each of the criteria, and will fail 
to consider the relative importance of the criteria or to evaluate 
properly the effect of the intercorrelations and validities or the 
partial regressions for predicting a composite. The problem is 
thus evaded rather than solved. In general, a formal solution 
will at least make explicit the basis for the decisions concerning 
the relative importance of the several criteria and will avoid 
incidental errors which may creep in because of carelessness in 
the subjective handling of the data. 

A suggestion by Otis (ia) may, in particular instances, lead 
to a more meaningful combination of subcriterion scores than 
would result from the application of any of the procedures so 
far mentioned. Otis pointed out that, in key-punch operation, 
it was discovered that the correction of an error required the 
time equivalent to that needed for punching 14 cards. He 
suggested, consequently, that a total over-all production index 
could readily be obtained simply by subtracting 14 cards for 
every error made. 

The method of combining criteria suggested in this particu¬ 
lar instance, is not exactly a technique and does not suggest any 
uniform procedure that can be widely employed. It does sug¬ 
gest, however, that detailed examination of the relationship 
between the different work units measured and the organization 
of the over-all productive process will often suggest that certain 
different sub-criteria are, or can be, expressed in units which 
are equivalent in their effect upon the total productivity of the 
organization. 

The effect of the use of inappropriate weights for criterion 
elements, as with other forms of bias, will depend upon the 
extent to which it is predictor free or predictor correlated. The 



CRITERION BIAS 


185 


overweighting of any given element will naturally afford undue 
weight to the predictors that have the highest correlation with 
the overweighted element or elements. Conversely, the pre¬ 
dictors that correlate highest with underweighted elements 
would be given inadequate weight. Prediction would hence be 
distorted and while in a selection problem, for example, the 
predictors would align the population in accord with the cri¬ 
terion as weighted, this alignment would be at variance without 
the “true” criterion. 

The reader may readily judge that the authors consider 
most procedures for criterion combination in current use to be 
not wholly adequate. This appears to be an area particularly in 
need of further research. 


Summary 

This paper proposes a classification of criterion bias into 
four main categories: 

1, Criterion deficiency 

1, Criterion contamination 

3. Criterion scale unit bias 

4. Criterion distortion 

Each category is discussed in terms of the steps in the cri¬ 
terion-construction process in which it is most likely to occur. 
Each is also briefly related to the several kinds of criterion 
measures. Each type of bias is also considered in relation to 
various types of criterion measures. The importance of dis¬ 
tinguishing between bias that is test free and bias that is test 
correlated is emphasized. In discussing possible biasing factors, 
the test-free or test-correlated character of the biasing factor 
has received continual emphasis. 

Biasing factors reported in the literature have been con¬ 
sidered. Additional concepts have been advanced by the 
authors. 

REFERENCES 

1. Bellows, R. M. “Procedures for Evaluating Vocational Criteria.” 

Journal of Applied Psychology , XXV (1941), 499 - 5 I 3 ' 

2. Bingham, W. V. “Halo, Invalid and Valid.” Journal of Applied 

Psychology, XXIII (1939), 221-22.8. 

3. Bingham, W. V. and Freyd, M. Procedures in Employment Psy¬ 

chology. New York: Shaw, 1926. 



IW) EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

I 

4, Conrad, H. S. “The Personal Equation m Ratings: A Systematic 

Evaluation.” Journal of Educational Psychology , XXIV 
( 1933 ). 39 - 4 ^ 

5, Edgerton, H. A, and Kolbe, L, E. “The^Method of Minimum 

Variation for the Combination of Criteria.” Psychometrika 
I (1936), 183-187. _ 1 

6, Fergeson, G. A. “The Factorial Interpretation of Test Diffi¬ 

culty,” Psychometrika, VI (1941), 67-77. 

7, Flanagan, J. C. “Critical Requirements: A New Approach to 

Employee Evaluation.” Personnel Psychology , II 
419-425. 

8, Gaylord, K. IT, and Russell, E, “West Point Evaluative Meas¬ 

ures in the Prediction of Officer Efficiency,” in preparation, 

9, Guilford, J. P. “New Standards for Test Evaluation.” Educa¬ 

tional AND PSYCHOLOGICAL MEASUREMENT, VI (1946), 427- 

438 . 

10, Hartshorne, H. and May, M. A, Studies in Deceit. New York: 

Macmillan Co., 1928. Page 414, 

11, Horst, P, “Obtaining a Composite Measure from a Number of 

Different Measures of the Same Attribute.” Psychometrika , 
I (1936), 53 - 6 o. 

12, Stead, W. H., Shortle, C. L., et al. Occupational Counseling Tech¬ 

niques. New York: American Book Co., 1940. 

13, Stuit, D. B. (Ed,) Personnel Research and Test Development in the 

Bureau of Naval Personnel, Princeton: Princeton University 
Press, 1947. 

14, Toops, H. A. “The Selection of Graduate Assistants.” The Per¬ 

sonnel Journal, VI (1928), 457-472. 

15, loops, H. A, “The Criterion.” Educational and Psychologi¬ 

cal Measurement, IV (1944), 271-297, 

16, Wherry, R. J. and Gaylord, R. H, "Factor Pattern of Test Items 

and Tests as a Function of the Correlation Coefficient: Con¬ 
tent, Difficulty, and Constant Error Factors,” Psychomet¬ 
rika, IX (1944), 237-244. 

17, Woodrow, IT. “Interrelations of Measures of Learning,” Journal 

of Psychology ,X { 1940), 49-73. 



AN INVESTIGATION OF TWO HYPOTHESES REGARD¬ 
ING THE NATURE OF THE SPATIAL-RELATIONS 
AND VISUALIZATION FACTORS 1 

WILLIAM B. MICHAEL 
Princeton University 
and 

WAYNE S. ZIMMERMAN and J. P. GUILFORD 
University of Southern California 

Primarily as a consequence of the factorial analyses of tests 
of intellectual abilities, the construct of a spatial and/or visual 
ability amenable to psychological measurement has received 
increasing attention in recent years, During the past twenty- 
one years, at least a score of investigators have identified in 
their writings a space factor. In a pioneer study, Thurstone 
(16) included among his seven primary mental abilities a fac¬ 
tor labelled S, which he characterized as a “facility in spatial 
and visual imagery,”—a factor which he likened to the spatial 
or visual group factor found by Kelley (13) in earlier experi¬ 
ments. The same factor was identified in other studies carried 
out subsequently by Thurstone (17) and by Thurstone and 
Thurstone (18). 

During World War II members of the psychological research 
units of the Army Air Forces devoted a considerable amount 
of time and effort to the development of tests of the “spatial- 
visual” type to be used in the selection of men for air-crew posi¬ 
tions. Several factorial studies which have been described in a 
research report of the AAF Aviation Psychology Research 
Program, edited by Guilford (3), have indicated that the vari- 

^The first-mentioned author wishes to express his sincere appreciation to the 
Social Science Research Council for kindly making available a grant-in-aid for the 
completion of this investigation. The authors are indebted to Professor L. L. Thurstone, 
who generously granted permission to have several of his tests offprinted in order that 
they might be included within the battery. Grateful acknowledgment is made to the 
staff of the Department of Psychology at Rutgers University for their interest in the 
study, their cooperation in making subjects available, and their assistance in adminis¬ 
tering a number of the tests. To all Rutgers students who participated, special thanks, 
W extended, 



1 88 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

ance associated with Thurstone’s spatial-visualization factor 
may be separated into two apparently independent factors 
identified to be spatial relations and visualization (visual manip¬ 
ulation). In fact, in addition to these two factors (abbreviated 
by the symbols Si and Vz), two other less definite space fac¬ 
tors, S 2 and Sa, and a factor tentatively identified as visual 
memory, also appeared in several analyses. 

In two recent studies both Fruchter (a) and Dudek (i) have 
found separate factors of spatial relations and visualization. In 
his investigation as to the nature of verbal fluency Fruchter 
reanalyzed a sub-matrix of twenty tests selected from the 
battery of fifty-seven variables employed by Thurstone in his 
classical study previously cited (16). He found two indepen¬ 
dent factors which he described as being spatial-relations and 
visualization. 

Referring to the same Thurstone study, Zimmerman (3) 
pointed out that further rotations of the residual axis (Number 
XII) with other axes which defined meaningful factors would 
produce a promising factor of visualization. Just recently, Zim¬ 
merman (in his unpublished doctoral dissertation) has rero¬ 
tated the twelve centroid axes for all fifty-seven variables and 
has confirmed his initial belief that both a spatial-relations and 
a visualization factor would appear. 

Problem 

The purpose of the investigation was to test the validity of 
two (apparently unrelated) hypotheses that purport to repre¬ 
sent differences in the psychological properties of the factors 
of spatial-relations and visualization as reflected by correspond¬ 
ing differences, both in the respective contents of two types of 
tasks and in the respective work procedures required of the sub¬ 
jects for successful completion of them. Each type of task con¬ 
sisted of a group of three tests. Within each of the two groups 
of tests employed in the study there appeared to be not only 
a similarity in the format of the test items, but also a common 
approach or operation demanded of the examinee. 

In broad outline the plan followed in the investigation was 
to incorporate within a test battery two groups of tests which 
the investigators believed to be representative of the psycho- 



SPATIAL-RELATIONS AND VISUALIZATION FACTORS 189 

logical operations involved in the hypothetical statements as to 
the nature of the spatial-relations factor and of the visualiza¬ 
tion factor. In the selection of each test to be incorporated 
within a group, introspection was freely employed as an aid to 
the determination of the psychological processes used in the 
subjects’ performance upon a test—the same processes sup¬ 
posedly as those indicated in the relevant hypotheses. To these 
six tests (i.e., two groups, each consisting of three tests) were 
added eight reference tests of fairly well-known factorial con¬ 
tent to aid in the identification of those portions of variance in 
the six tests that were associated with other factors such as 
verbality, numerical facility, reasoning, and perceptual speed. 
The inclusion of other factor tests served not only to identify 
what probably without their presence would be large ^mounts 
of specific variance within each of the six tests, but also to in¬ 
dicate the relative degree of purity of each of these six tests with 
respect to the function it was hypothesized to measure. 2 

A sufficient, though not necessary, condition for the tena- 
bility of each of the hypotheses, would be that in the factor- 
analysis procedure each of the two groups of three tests would 
define a factor. Moreover, this factor should not appear to be 
weighted in other tests of the battery that were selected to 
measure other factors. If one or more tests within either group 
should be weighted substantially in variance associated with 
another factor, the evidence for the corresponding hypothesis 
would be less clear-cut, but not necessarily lacking. It would be 
quite possible, if not almost certain, that one or more of the 
three tests within a given group might be factorially complex. 
At the same time, however, all three tests within a given group 
might contain substantial amounts of variance in one factor 
that did not appear in any of the other eleven tests. 

Hypotheses 

The factor of spatial relations was hypothesized to represent 
the ability to comprehend the arrangement of elements within 

1 It was also thought to be very desirable to determine whether tests of the type 
used by the AAF and Thurstone's tests held in common factors identified as being the 
same. This is the first study the writers know of that will serve to check upon the belief 
that many of the Thurstone primary abilities and the AAF factors are identical. Only 
the Thurstone space factor is here called into question. 



190 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

a visual stimulus pattern, primarily with reference to the hu¬ 
man body. Thus, an important implication in the ability to per¬ 
ceive spatial arrangements is that the subject is able to dis¬ 
tinguish whether one object is higher or lower, left or right, or 
nearer or farther than another within the same field. Through 
the presentation of two simulated views of a stimulus pattern, 
a test item may be constructed such that there is a systematic 
relationship between the order of elements within the first 
spatial pattern (the stimulus component of a test item) and the 
order of elements within the second pattern (the response 
component of a test item). 

For example, in Thurstone’s Cubes test the examinee is asked 
to recognize whether the designs on the sides of a second cube 
can hold the same relationship to one another as they do on the 
first cube. By noticing within each cube the left-right, top- 
bottom, and front-back interrelationships of the faces, the sub¬ 
ject is able in each item to refer the locations of three designs 
on three exposed faces of one cube to the locations of designs 
on the faces of the other cube. In Thurstone’s Flags test the 
examinee is required to tell whether the exposed faces of two 
American flags of identical size can represent the same side of 
the flag. Relating corresponding left-right and top-bottom 
boundaries (outlines) of the two flags appears to be an impor¬ 
tant aspect of the solution. Similarly, in Guilford and Zimmer¬ 
man’s test of Spatial Orientation a premium is placed upon the 
examinee’s maintaining the correct relationship of objects to 
one another in background scenery that has been viewed twice 
from a motorboat—first before and then after its prow has 
moved up or down and/or left or right. In the test the examinee 
is asked to determine the relative amount and direction of 
movement of the boat corresponding to changes in the two 
views of the background setting, 

The factor of visualization was hypothesized to represent an 
ability that requires the mental manipulation of visual images. 
In contrast to another factor identified as visual memory (3), 
which appears to be a static or reproductive form of visual¬ 
ization, the factor referred to as visual manipulation, or simply 
visualization, is dynamic. This visual manipulative ability ap¬ 
pears to be present in the solution of problems in which the ir^ 



SPATIAL-RELATIONS AND VISUALIZATION FACTORS I9I 

dividual finds it necessary mentally to move, rotate, turn, twist, 
or invert one or more objects. Following the performance of the 
presented manipulation the individual is required to recog¬ 
nize the new position, location, or changed appearance of the 
object or objects. 

Three tests selected to yield evidence for the second hypoth¬ 
esis included two by Thurstone, Punched Holes and Form 
Board , and one by Guilford and Zimmerman, Spatial Visual¬ 
ization. In the test of Punched Holes the examinee is presented 
a symbolic representation of a folded sheet of paper into which 
one or more holes have been punched and is required to imagine 
where the holes will be when the sheet is unfolded. In the 
second Thurstone test the examinee apparently finds it neces¬ 
sary in each item mentally to turn, rotate, or invert two or 
more flat geometric figures in such a way that they can be 
placed together to fit within the outline of a larger geometric 
figure. In each of the tests, the examinee is asked to record the 
final positions respectively of the holes and of the geometric 
figures. In the test of Spatial Visualization the subject is re¬ 
quired mentally to turn, tilt, or rotate a three-dimensional 
object—an alarm clock—drawn on a sheet of paper into a final 
position according to written instructions. As alternative re¬ 
sponses the pictures of the clock are presented in five positions, 
one of which is correct. (A more detailed description of these 
three tests follows in the next section.) 

Whereas in the two Thurstone tests the examinee is required 
to draw in his solution to the problem, in the third test he 
merely selects as his solution one of five choices presented- It 
is quite likely that in addition to measuring visual manipu¬ 
lative ability other factors are involved in the three tests—fac¬ 
tors reflecting the manner in which responses to the items are 
recorded. 

Another important difference in the nature of the psycho¬ 
logical processes hypothesized for the spatial relations and 
visualization factors was that of speed of response. As indicated 
by findings in the AAF Aviation Psychology Program, the 
tests thought to measure the spatial relations factor were ad¬ 
ministered with fairly short time limits, but those tests thought 
to measure visualization were given with fairly liberal time al- 



igz educational and psychological measurement 

lowances. The spatial relations factor was considered to de¬ 
mand a fairly rapid decision on the part of the examinee as to 
the spatial position of objects with reference to his own loca¬ 
tion; whereas, the visualization factor was believed to be rep¬ 
resented in problems requiring a more deliberate and less auto¬ 
matic approach. In part, such a distinction may be a function 
of the complexity of a task (i.e., the number of steps entering 
into the performance of an item), the more complex tasks re¬ 
quiring visualization for their solution. 

Concerning the psychological properties of spatial-relations 
and visualization factors, one other important difference has 
been suggested in the work of one of the psychological research 
units of the AAF, as follows: 

The idea for Flight Orientation [a test] was proposed at 
the time Aerial Orientation [another test] was being developed, 

It was hypothesized (i) that the ability visually to maneuver 
an airplane as if from a position outside the cockpit is a mani¬ 
pulatory-visualization ability and (a) that the ability to imag¬ 
ine maneuvers taking place as if the examinee were within the 
cockpit is a spatial-orientation ability. 

The Aerial Orientation test utilized cockpit views of outside 
terrain to be matched with depicted plane attitudes; the 
visualization-of-maneuvers tests involved only views of air¬ 
planes seen from a position outside of the cockpit. . . . Flight 
Orientation was designed to fulfill the requirements of the 
indicated variation—a test that would utilize only cockpit 
views of outside terrain. From hypotheses given above, it 
follows that Aerial Orientation should measure a combination 
of manipulatory-visualization and spatial-orientation abilities, 
while Flight Orientation should be a purer measure of the 
ability to orient in space (3). 

That the two groups of tests selected for investigating the 
validity of the hypotheses may actually contain variance in 
both the spatial-relations and visualization factors would not 
be surprising, inasmuch as many subjects on the basis of their 
own introspective reports revealed that they made use of the 
two psychological processes associated with the respective hy¬ 
potheses in tests selected to represent the implications of only 
one hypothesis. For example, if in the Flags test the subject 
is able, so to speak, to pick up the flag, move it, turn it about 
as if he actually has a model in his hands, then visualization is 
believed to be dominant. On the other hand, if the subject is 



SPATIAL-RELATIONS AND VISUALIZATION FACTORS 


concerned primarily with the left-right and top-bottom orienta¬ 
tion of edges of flags with respect to his own position, or if he 
has to move himself to a different position as in cocking his 
head to one side, then a spatial factor is believed to be more 
prominent. 

Similarly, in the Cubes test if the subject reports he picks 
up the first cube and rotates it into a final position which 
matches (or cannot match) the second cube, then the visual¬ 
ization process is dominant. However, if he attempts primarily 
to interrelate the positions of the sides of the cubes with respect 
to his own position, or if he appears to project himself amidst 
the cubes as if he were walking about them and relating the 
locations of various sides with respect to his own position, then 
the spatial-relations factor is probably operative. It may well 
be that in the spatial-relations factor empathy plays an im¬ 
portant role in the relating of the position of objects to one's 
own location; whereas in visualization the individual obtains 
first from a distance an overall view of the objects to be manip¬ 
ulated and then employs perhaps some rather restricted kines¬ 
thetic imagery in the imagined use of hands for moving the 
objects into their required positions. 

Despite the apparent differences in approach employed by 
many subjects, it did appear that the two groups of tests 
chosen represented reasonably well a distinction between the 
psychological processes hypothesized. If a test did involve to a 
substantial degree the use of two or more psychological abili¬ 
ties, it was thought that the factor-analysis procedure would 
reveal such a fact. 

Tests 

In Table i are presented the names of the fourteen pencil- 
and-paper tests employed in the battery, the maximum num¬ 
ber of items that could be attempted, the plan followed with 
respect to "speed” or "power” time-limit, the actual working 
time allowed, and the scoring formula used. The numbering of 
the tests in the tables, as well as in the following description 
of content and procedure, corresponds to the order of adminis¬ 
tration. During the first, second, third, and fourth testing 
periods, respectively, the following groups of tests were ad- 



194 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ministered: i, 2, 3, and 4; 5, 6, 7, 8, and 9; 10 and 11; 12, 13 
and 14. An ample number of practice exercises preceded the 
main body of each test. Further information concerning several 
of the tests may be found both in a manual (11) and in the 
literature (12, 16, 18, 20). It is believed, however, that the 
descriptions given will suffice for the interpretation of the fac¬ 
tors to be presented. 

I. Guilford-Zimmerman Verbal Comprehension .—This is a vo¬ 
cabulary test in which the examinee is required in each item to 


TABLE 1 

The Test Battery: Descriptive Data 



Name of Test SteS 

Timing Plan 
(Speed or Power) 

Working 

Time 

Scoring 

Formula 

I. 

Guilford-Zimmerman Verbal 
Comprehension. 

3 <> 

Power 

10 min. 

R-W/4 

1 , 

Guilford-Zimmerman General 
Reasoning.. 

>4 

Power 

13 min. 

R-W/4 

3 ■ 

Guilford-Zimmerman Numeri¬ 
cal Operations. 

Guilford-Zimmerman Percep¬ 
tual Speed. 

HO 

Speed 

5 min. 

R-W 

4 - 

18 

Speed 

3 min., 

R-W 

5 - 

Guilford-Zimmerman Spatial 
Orientation . 

60 

Speed 

45 sec. 

8 min. 

R-W/4 

6 . 

Thurstone [Verbal] Comple- 

3 ° 

Power 

7 min. 

R-W /4 

7. 

tion. 

Thurstone Number Series. . . . 

20 

Power 

8 min. 

R 

8. 

Thurstone Identical Forms. . . 

40 

Speed 

3 min., 

R-W 

9 . 

Thurstone Cubes . 

42 

Speed 

15 sec. 

5 min. 

R-W 

IO. 

Thurstone Flags . 

48 

Speed 

4 min. 

R-W 

II. 

Guilford-Zimmerman Spatial 
Visualization. 

40 

Power (limited) 

ij min. 

R-W/4 

11 , 

Thurstone Punched Holes.... 

IO 

Power 

7 min. 

R 

13 - 

Thurstone Pattern Analogies.. 

20 

Power 

10 min. 

R-W/4 

14. 

Thurstone Form Board. 

28 

Power 

7 min. 

R 


choose among five words, all matched with respect to difficulty, 
the one word which most closely approximates the meaning of 
the stimulus word. Items increase in difficulty progressively 
from the beginning to the end. Even numbered items were 
omitted, Responses were recorded on a separate answer sheet. 
Most examinees attempted all items, 

2, Guilford-Zimmerman General Reasoning .—This test is com¬ 
posed of arithmetical-reasoning problems similar to those en¬ 
countered in courses in general mathematics, elementary alge¬ 
bra, and intermediate algebra, Diagrams accompany a few of 
















SPATIAL-RELATIONS AND VISUALIZATION FACTORS 


*95 


the problems. Numerical work is kept to a minimum. Five 
multiple-choice responses are presented with each problem 
statement. Items increase in difficulty level progressively from 
the beginning to the end. Even-numbered items were omitted. 
Responses were recorded on a separate answer sheet. Most ex¬ 
aminees attempted all assigned items. 

3. Guilford-Zimmerman Numerical Operations. —This test is 
in four parts, consisting of numerous simple problems (of about 
the same difficulty level) involving respectively the four funda¬ 
mental operations of addition, subtraction, multiplication and 
division. Emphasis is placed in the directions upon the need 
for both accuracy and speed of work. Subjects were told to 
begin with the part upon addition, to work every item, and to 
go as far as possible in the allotted time. Only a few subjects 
reached the fourth section upon division. Responses to the 
items were printed in spaces on the test booklet adjacent to 
the problems. 

4. Guilford-Zimmerman Perceptual Speed. —This test requires 
the examinee to match a visual object of a familiar shape and 
of detailed design with one of five other visual objects of a com¬ 
mon category (e.g., automobiles, boats, hats, shoes). Four of 
the five response objects resemble rather closely the stimulus 
object, but difFer from it in certain minor details of shape and/ 
or design. For each common category two parallel sets of visual 
objects—four stimulus and five response objects—are arranged 
in two parallel columns. To each one of the four stimulus 
figures in the first column corresponds one of the five response 
figures. Thus, four responses are scored for each item of homo¬ 
geneous content. All items represent a low level of difficulty. 
Answers to the items were marked on the test booklet in spaces 
adjacent to each stimulus object. The examinees were told to 
go as far as possible in the allotted time. No examinee finished. 

5. Guilford-Zimmerman Spatial Orientation, —This test re¬ 
quires an examinee to determine how the position of a boat 
has changed in a second picture from its initial position in a first 
picture. In each picture the prow of the motorboat, in which 
the examinee is told to pretend to be riding, is shown along 
with background scenery consisting of water, or a silhouetted 
shore line, and in some instances of other boa.ts intervening be- 



I96 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tween the shore line and the prow of the motorboat, which is 
in the extreme foreground of the picture. In the sample prob¬ 
lems described in detail in the directions, the position of the 
prow in the second picture, with respect to the spot of back¬ 
ground sighted over it in the first picture, is taken as the 
primary reference guide for determination of the direction and 
amount of subsequent up-down and/or left-right motion of the 
boat. Movement is also indicated by accompanying shifts in 
the location of elements within the pattern of visible back¬ 
ground scenery. The boat is actually stationary with respect 
to any forward-backward motion. To each set of two pictures 
five alternative responses are presented. Each response is rep¬ 
resented by (1) a dot designating the aiming point, the initial 
spot in the background sighted right over the point of the prow 
in the first picture, and (a) an arc (of about 45°) representing 
the location of the prow in the second picture with reference to 
the aiming point. One of the five responses shows the correct 
change in position of the prow of the boat with respect to the 
aiming point. All examinees were instructed in the limited 
time allowed to attempt as many items as possible. As in all 
other speed tests, answers were recorded in the test booklet. 
The difficulty of the items tends to increase for items further 
removed from the beginning of the test. No one attempted 
every item. 

6. Thurstone [Verbal] Completion .—This test is one adapted 
from the Psychological Examination of the American Council on 
Education. Representing, probably, a combination of verbal 
comprehension and verbal fluency, it presents for each item the 
definition of a word, the number of letters in the word, and 
five alternative letters (responses), one of which represents the 
initial letter of the defined word. Although the items differ con¬ 
siderably with respect to difficulty, most of the defined words 
are familiar to college students. Responses were recorded on the 
page of test items. Nearly every subject attempted all items. 

7. Phurstone Number Series .—Found to be loaded in a factor 
identified by Thurstone as induction, this test requires the 
subject to determine a rule for each item. Numbers are pre¬ 
sented in a row with two blanks inserted. The task is to find 
the mathematical principle by which the number series 



SPATIAL-RELATIONS AND VISUALIZATION FACTORS 


is formed and to insert in the blank that number which is ap¬ 
propriate. The difficulty level of items increases in relation to 
the position of the item from the beginning of the test. Re-' 
sponses were recorded on the test sheets in the blanks inserted 
at various positions within the different number series. Most of 
the subjects attempted all items, One point of credit was given 
to each blank correctly filled (two points per item being maxi¬ 
mum score). 

8. Fhurstone Identical Forms. —This test resembles rather 
closely the fourth test, Perceptual Speed, in that the examinee 
selects from a row of five similar appearing figures that one 
which is exactly the same as the stimulus figure. Slight differ¬ 
ences in color design and in shape appear among the five re¬ 
sponse figures. In this test the items are also homogeneous with 
respect to difficulty. The number corresponding to the se¬ 
quential position of the response selected was recorded on the 
test page in a box to the right of the row of response objects. 
Only a few examinees reached the last few items. 

9. Thurstone Cubes. —In this difficult test the subject is asked 
whether two drawings can represent the same cube on each face 
of which there is supposed to be a different design. In each of 
the two drawings the designs of three faces of the cubes are 
always exposed. If the two drawings can represent the same 
cube, a plus sign is placed in a blank square to the right of the 
two drawn cubes. If, on the other hand, the second drawing can¬ 
not represent the cube of the first drawing, then a negative sign 
is placed in the adjacent square. In the short time allowed no 
one attempted all items. 

10. Fhurstone Flags. —On this test two flag pictures, of the 
same size and of identical design, are presented occasionally in 
the same position, but generally in different positions. If the 
two drawings represent the same face of the flag, a plus sign 
is placed in a square on the test sheet just to the right of the 
two flags. If the two drawings represent opposite faces of the 
same flag, a minus sign is placed in the adjacent square. As in 
the test of Cubes the items were homogeneous with respect to 
difficulty. However, they were easy for most subjects. A few of 
the subjects attempted all items during the short period of time 
allowed. 



198 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ii. Guiljord-Zimmerman Spatial Visualization .—This is a 
test in which the examinee attempts to imagine the movement 
of a clock in space from an initial position to a final position as 
directed by a verbal statement. The test is divided into three 
parts. In the first part, one movement of the clock is required 
to effect the final position; in the second part, two movements 
are called for, and, in the third part, three movements are in¬ 
dicated by the directions accompanying each item. Three types 
of movements are required. Each type of movement refers to 
the revolution of the clock about an axis in one of three di¬ 
mensions. The actual movement involves a revolution of the 
clock to the right or to the left a specified number of degrees. 
The word “turn” is used to designate a revolution about the 
base or the “6-ia” axis where the numbers refer to the nu¬ 
merals representing hours on the clock. When the clock is tilted 
such that top moves either forward or backward, or in other 
words, when the clock is revolved about the “3-9” axis, the 
word “tilt” is employed. When the clock revolves about an 
axis perpendicular to its face, the word “rotate” is used. In the 
second part, two different types of movement are required, and 
six permutations of sequence of movement are used. In the 
third part, the same sequence of movements is followed in all 
items (rotate, tilt, and turn). Nearly all of the subjects failed 
to complete the entire test, but about 80 per cent attempted 
all items in the first two parts. Items were scored up to the 
point at which 67 per cent of the group attempted them. 

in. Phurstone Punched Holes .—Each item in this test con¬ 
sists of a series of figures representing a square sheet of paper 
that has been folded by steps (as indicated by dotted lines) into 
smaller squares, rectangular, or triangular sizes. One or more 
holes are punched into the final folded form. The task for the 
subject is to imagine where the holes will be when the sheet is 
unfolded. As an aid to the subject's performance in the more 
difficult items one or more figures representing the appearance 
of the sheet of paper at intermediate stages of unfolding are 
presented. On the unfolded (square) sheet the subject indicates 
by drawing small circles where the holes will be. In the scoring 
of the item all holes must be properly spaced in relation to one 
another if credit is to be given. An item was scored right or 



SPATIAL-RELATIONS AND VISUALIZATION FACTORS 


wrong (no partial credits were given). Nearly every subject 
completed all the items. 

13. Thurstone Pattern Analogies .—Adapted from similar tests 
in the American Council on Education series, this test is com¬ 
posed of items each of which consists of eight figures. The first 
three (stimulus) figures are labelled A, B, C, and the next five 
(response) figures are designated 1, 2 , 3, p, and 3. After the 
examinee determines the rule by which figure A is changed to 
figure B, he applies the rule to figure C and picks out among 
the five arabic numbered responses that one which satisfies the 
requirements of the problem, In the more complex items the 
examinee may frequently change his hypothesis as to the prin¬ 
ciple connecting A and B in view of limitations imposed by the 
nature of the five responses figures. In the time allowed, most 
subjects completed all items. 

14. 'Thurstone Form Board .—Almost identical with the Min¬ 
nesota Form Board Test, except for the inclusion of printed in¬ 
structions and a practice exercise, this test consists of items 
made up of several two-dimensional pieces (colored black) of 
various geometrical shapes which the examinee attempts to fit 
together in an appropriate arrangement within a larger geo¬ 
metric form (uncolored figure within an outline). The subject 
draws lines within the large white (uncolored) design to show 
how the black pieces can be placed in order to fit within the 
outline. Extreme accuracy in drawing was not required, but the 
solution had to be indicated clearly. No partial credits were 
given. Although the items became increasingly difficult as one 
approached the end of the test, very few subjects failed to 
attempt all the items in the time allowed. 

The Sample 

To a group of 500 male students enrolled in a two-semester 
course in beginning psychology at Rutgers University the bat¬ 
tery of fourteen pencil-and-paper tests was administered. Since 
four class periods, spread over the last part of the first semester 
and the first part of the second semester of the academic year, 
were required for completion of the project, many of the sub¬ 
jects were not present at all class sessions. Makeups were given 
in several instances. Complete results were obtained for 360 



200 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

subjects, These individuals appeared to be a representative 
sample of the University student body in light of biographical 
information obtained from each student. Consisting of 220 
freshmen and sophomores and 140 juniors and seniors majoring 
in virtually every department of the University, the sample was 
deemed satisfactory. Approximately 54 per cent of the subjects 
were veterans of World War II, The ages of the subjects ranged 
from 16 to 34, the median age being 22 years. 

In order that a satisfactory degree of interest might be sus¬ 
tained throughout the duration of the study, all students were 
told that they would be given their scores upon completion of 
testing in profile form, In fact, most subjects received scores on 
those tests completed during the first two class periods at the 
beginning of the third period. It was thought that additional 
motivation might be provided if the temporal interval between 
taking the tests and receiving the scores was not too long. 

Flic Factor Analysis 

The matrix of test intercorrelations (all product-moment) 
presented in Table 2 was factor-analyzed by Thurstone’s cen¬ 
troid method in the usual manner with one minor exception 
(13). In the reflection of signs the criterion was that used by 
the workers in several of the psychological research units of the 
United States Army Air Forces during World War II. The 
algebraic sum of a column, with the diagonal entry disregarded, 
was employed instead of the mere number of negative signs ap¬ 
pearing in a column. This procedure not only tends to guaran¬ 
tee positive sums but also appears to approximate more closely 
the maximizing of table totals than does the criterion involving 
number of negative signs. 

Because of marked discrepancies between obtained commu- 
nalities in the first set of centroid extractions and the estimated 
communalities in the diagonals, a second set of extractions was 
required. Following the second extraction (of seven centroid 
factors) the obtained communality of no test differed more than 
| .07 | from the second estimated communality. 

The criterion employed for cessation of extraction of the cen¬ 
troid factors was also that used by workers in the psychological 
research units of the AAF; namely, that factoring should not 



TABLE i 


SPATIAL-RELATIONS AND VISUALIZATION FACTORS 


201 



NH H M NH O « d ■ 

VO r" d vn n Tf Vn t^vo ONO i-O 

o ci o n ^ cs n n 1 


oo d viw cl jn nNl^n o® i d 

ci ^t-vo M O H hrt o 

o h o h nd m nnn 1 co 


w o r^-r^'H-o 'aOn^i o 

8 0 m h-vo 4 hvo o 
non ^ d md c-> co vr» 


wnddvoovdovdvor^. Q O Q 
T+-r->r>o o oco co o ^ o r-* o 
m co O covo rj- d co co co 1 


vio O'o NroONfi >a i r*" *-0 co w 
HdriH'oOf’in'o -«*• r-» >-< Ov 
OrtMdcoMddco 1 n n n n 


o COCO '-n v>vo co I vnvo Ov r" 
vo cooo oo O'ooo Or ^ o Th d \o 
OdOd'r*-ddd 1 co co co d co 

r— '-n m r-Dc \o ■ co d d *c 

H- coco w vo oo h I ov co covo co r- 

o H H V, H f) I d d CO d d CO 


d O H d hh I M 3 -^-OChO H w 

OO i"'-V£) Vy, d H CO COOO *-H o VO 

H ncl H IS cl 1 cl d d d CO co d 

co »h oo r~- r"- . r-oo vo cod ^ d 

vi ^ co qmh rioovo 

t 1- <H O O co 1 dMd*-c^ddco 


Co Vo 0 *t -*+- 

r^oo d vo 
O d o d 


- vn 0*, D- d CO 
O VO 0 VO VO co 
-^- COVO -O- CO -**■ 


■v*-d*. d r-vovo vn O'CO -rt- 
Vfi gv V| i-, cq m 0 r^-'***-/-, 
d O h virt d co d >-i co 


Vo CO I n 0 \oo MMOOOdO^dOv 
coco o d covo oo co d r-% hi l>- ci 
O dOOdMOM0OOO 


n d Vi h o ^ n Q dood hh 
vo oo co oo 'd- r^- «o co d O *-o r- 
d 1 HiMcidco^ddcocoMci 

. TfV|V|i^od N 

VI o r> voce -T*-VO H ^-Q dvo 

1 dOOO'^-MOOOniOOO 

i ill 


I !g§ : : 

nri-SU i 

^ ft +i <u 5 

5 iS-fii 


1 :-E : 

4 o i 5 ; 

o 'o ~c3 

Jwj s 
b'slpg 


2 a jj s'II’b'Ij a,|| a s 

m d‘ co vAvo r^-od cn o hh d co -4- 



















202 . educational and psychological measurement 


cease until the product of the two highest factor loadings is at 
least less than the standard error of the corresponding correla¬ 
tion. Such a criterion tends to yield a greater number of factors 
than do most other criteria. The rationale underlying this less 
stringent criterion is that the maximum contribution which the 
factor makes to the scalar product of two test vectors, or to the 
correlation between two tests, is no greater than the chance 
relationship expressed by the standard error of the correlation 
coefficient. 

Following the completion of a set of trial rotations, it was 
considered advisable to extract two more centroid factors as an 
aid to further rotations. It was known that probably only six 
factors would be meaningfully identified. However, previous 
experience has indicated that use of additional centroid axes 
in the rotation process frequently brings about, more readily, 
a psychologically meaningful solution. The superfluous factors 
eventually appear as mere residuals (factors containing insignif¬ 
icant amounts of communality) to which no interpretation can 
be dependably given. Moreover, the presence of residual factors 
seldom interferes at the conclusion of the rotation procedure 
with the interpretation of those principal factors which account 
for most of the common-factor variance. 

Fifty-six rotations of pairs of axes were required to satisfy 
Thurstone’s criteria of positive manifold and simple structure. 
Each rotation was achieved graphically according to the method 
devised by Zimmerman (19). In general the structure deter¬ 
mined the direction and magnitude of each new rotation. Infor¬ 
mation concerning the content of tests was put to use only to¬ 
ward the end of the rotation procedure when minor adjustments 
were made. In view of the large number of rotations the differ¬ 
ences between the communalities of centroid factors and final 
rotated factors were negligible, the largest two discrepancies 
being .017 and .013. An orthogonal reference frame appeared to 
suffice for the interpretation of the factors. The final rotated 
factor loadings are shown in Table 4. 

Interpretation of Factors 

Inspection of the final rotated factor loadings in Table 4 re¬ 
veals that on the whole the criteria of positive manifold and 



TABLE 3 


SPATIAL-RELATIONS AND VISUALIZATION FACTORS 203 


cn d *ovo ODO oo Nrt h CTnoo d vh 
d ^-ca ^oo Ch m o cl I>0 \m 

vo 4 ^viW 3 V\vnvin^*t>‘ v O nvo 


rooo m > +c*)h-coSH CM « d O 

H MD Os oo vo VO h VO Cl CO h-i t"~ r-' QV 

hhOhoOOmOOOOOOO 

ill ii mi 


vo oo co OVD m d O O Ov *o ^vo £> 
ro ^ovo c-j d vo oOnmoovo o 
OOmOOOmOOOmO^*-' 

I II I I I I 


H l-i Cl cl m «-000 C»> 

O co n QNd c-^oo cr«t h d 

1 I II I I I I 


n ctn h h o*o h ti oh dvo 

nod o d moovovo d covo r^-vo 

Q)-lMdMMOOOlHCll-ll-H»-l 


i i i i i 


cl oo OO W-, CO i- c*->oo d m O n d n 

vo vo ci r- co t— CT' ^ vo os r^o no 

Owi-iO^t-tOOOOOddcl 


i i i ii i 


oo VO Q O O Os ^ >o n Os DO d 
CO Os O VO vo o VI Ncl r-cod O '*#■ 
c^Hiicli-icodclOMOvHi-iw 

l i i l i 


n t> vm ovvo hh o >-« o vo w -*4-oo 

ov vo m vo o hH d >** Qs O 

d o vo CO d co ^ no o n« O H 

II II I I 


O O oo oo co r-*vo O co CTs vn 

tj-o i-j vo r^q as vo oo n vn « 

Vv, rn d d Cl d d d o 1-1 M ci >-H d 

ill ii 


^-ci n r^co onh on r~-vo o w d 
no m n m vo d d '-•o 

d *o >-» -* 4 -vo io vo vo vo vo r-vo tJ-vo 


2 u-g : g.l : : : : 
-Sj E^'c j§ : » : : 
a o &S g : : 

|'g o : : 

i( 2 Taugi 3 «»S : : 

Oil u m'IS - * 

j§a a-s-e e §_s & 
sjS”s gJlJ.2 3.5 
co£,z; 3 apq 


H ci n ^ vovo r-oo Os o H £■"> ^4- 


















ac >4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 




* Decimal points omitted. 

| Comnmnalities based on rotated factor loadings expressed to three decimal places. 


















































SPATIAL-RELATIONS AND VISUALIZATION FACTORS ■ 20 5 


simple structure have been fulfilled. Six rotated factors were 
meaningfully identified as visualization (Vz), verbality (V), 
numerical facility (N), general reasoning (R), spatial-relations 
(S), and perceptual speed (P). Two other factors (Vi and V 2 ) 
appeared that could not be satisfactorily defined, although their 
weights in certain tests were suggestive of possible interpreta¬ 
tions. A ninth factor turned out as a residual with loadings 
ranging from —.08 to +.13. 

Inasmuch as the primary purpose of the study centered about 
the investigation of the factors of spatial relations and visual¬ 
ization, the discussion relating to the identification and meaning, 
of the other four factors will be kept to a minimum. The fac¬ 
tors, V, N, and P are actually doublets. However, since the 
factorial content of the pairs of tests weighted in these three 
factors was well known in advance of their inclusion within the 
battery, there is little reason to doubt the correctness of the 
identification given. 

It should be pointed out that the major loadings in some 
tests describing these three factors tended to be somewhat 
smaller than those reported in other studies or in manuals. This 
is due to the fact that many of the tests were shortened in order 
that they might be given within the time period available for 
testing. 3 However, in view of the size of the sample (N = 360), 
loadings of .35 or greater are probably indicative of the presence 
of a significant amount of variance in a factor. 

Somewhat greater attention should probably be given to the 
interpretation of the factor R. Two tests, General Reasoning and 


3 It is possible, however, to estimate what the loadings of these three factors, as well 
as the loadings of the other factors, would be if the tests were not shortened (p). When 
a test is homogeneously changed in length the new factor loadings may be estimated 
by the formula 

kmn — bnl A / - ----— , 

y 1 + (« — i)r u 

where n = number of times the test has been lengthened, or the ratio of the length of 
the new form to the original form; 
kmi = loading of factor mm the original, or unlengthened test r; 
k m n = loading of factor m In the lengthened, or new, form of the test; 
rn = reliability of the unlengthened test, 

If the shortened experimental forms of tests (i), (2), (3), (4)1 (s)> and C 11 ) are con¬ 
sidered to be extended to their original length, the corrected loadings in the principal 
factor in each test are estimated to be respectively, .712, .564, .657, .673, .587, and 
,631, compared to the obtained loadings of .698, .537, .642, .664, .578, and .619 (which 
are rounded to two figures in Table 4). The assumption is made in the speed tests that 
the number of items completed per unit time remains constant. 



106 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Number Series, are loaded in this factor to the extent of 
and .42, respectively, In view of the small number of items con¬ 
tained in the shortened form of the first test (fourteen in all) 
and of the consequent limitation imposed on the reliability of 
the test, the magnitude of first loading is substantial. Although 
the factor may be tentatively described as relating to some type 
of reasoning function, it is not clearly defined. That it may rep¬ 
resent an ability to grasp the essential steps involved in the 
solution of problems presented in quantitative or symbolic 
terms appears to be a plausible interpretation. 

Interesting to note is the fact that factor V! is loaded .39 
and .42 in the two tests Number Series and Pattern Analogies 
respectively. A highly speculative interpretation would suggest 
that this factor may be that of induction previously identified 
by Thurstone (16). When the possible existence of an induction 
factor is taken into account along with the fact that the test of 
Pattern Analogies received an insignificant loading of .09 in the 
factor R, it appears even more plausible that the factor R may 
represent an ability to diagnose a problem expressed in quanti¬ 
tative terms. If the interpretation of the R factor is correct, a 
significant finding is that a test ( General Reasoning) can be 
constructed to measure quantitative thinking without the in¬ 
troduction of substantial amounts of variance in the numerical 
factor. 

Examination of the loadings for the final rotated factors I 
and VII in Table 4 reveals positive, though not conclusive, 
evidence for the existence of two reference variables which 
may be meaningfully identified as spatial-relations and visual¬ 
ization, In short, the two hypotheses as set forth are, in the 
main, upheld—at least to the extent that the factorial com¬ 
position of the two groups of selected tests differs, 

In the following list of four tests, the first three of which 
were selected to test the hypothesis relating to the psychological 
processes involved in visualization, loadings of .35 or higher in 
all rotated factors including I (Vz), VII (S), and VI (V 2 ) may 
be summarized as follows: 


Tests Factor I (Vz) 

(xi) Spatial Visualization ,6a 

(12) Punched Holes .52 

(14) Form Board ,52 

(0 Spatial Orientation .42 


Other Factors 
■44S 

.36V2 (. 25 S) 
.43V2 (.22S) 
.<8S 



SPATIAL-RELATIONS AND VISUALIZATION FACTORS 20~j 

In view of the presence of weights of .52 or higher in three 
tests Spatial Visualization, Punched Holes, and Form Board , 
(all three of which made up the group intended to represent a 
measure of visualization), factor I can be identified as visual¬ 
ization, even though the test of Spatial Visualization is loaded 
to the extent of .44 in factor VII (S). That the spatial-relations 
and visualization abilities may be required in one or more tests 
in either of the two groups of tests inserted in the battery was 
mentioned previously as a definite possibility. 

After taking the test of Spatial Visualization, many of the 
subjects reported that in addition to manipulating mentally 
the stimulus figure (an alarm clock) into the final position 
called for by the verbal directions, they also related the loca¬ 
tion of various parts of the stimulus object (hands, numerals, 
top, base, winding and setting mechanisms of the clock) to the 
location of corresponding parts of one or more response figures 
(five alarm clocks in different positions). In the easier items 
which required only one manipulation the role of spatial cues 
is undoubtedly important. On the other hand, in those items 
requiring two or three movements of the clock, it would ap¬ 
pear that a greater dependence was placed upon manipulations 
of the clock; in fact, in the most difficult items variance asso¬ 
ciated with reasoning, verbal, and memory factors would possi¬ 
bly be important. However, only four items requiring a se¬ 
quence of three movements were scored. Nevertheless, a small, 
though perhaps insignificant, loading of .25 appeared in the R 
factor. In short, the influence of the range of difficulty of items 
upon the factorial content of a test may be substantial, as a 
previous study has shown (6). 

In two other tests, Punched Holes and Form Board, which 
were weighted heavily in the visualization factor, small load¬ 
ings of .25 and .22, respectively, appear in the factor to be 
identified as spatial-relations. More important, however, are 
the corresponding loadings of .36 and .43 in a factor V2. Al¬ 
though not amenable to 4 dependable identification, this factor 
may be associated with the drawing (filling in) response re¬ 
quired of the examinees. Despite their relatively high satura¬ 
tions in the visualization factor, these two tests appear to in¬ 
volve additional unknown factors. 

The visualization factor loading of .42 in the test of Spatial 



2.08 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Orientation , which was chosen to represent a measure of the 
spatial-relations factor, is probably indicative of the use by 
some of the examinees of visualization. Introspective reports 
from subjects differed as to the technique used in working the 
items. The variance representing visualization ability may be 
attributed to the tendency of several examinees mentally to 
manipulate the boat, as if it were a small toy, up and down 
and/or left or right and to imagine concomitant changes in the 
scenery. Many of the subjects reported that they did not place 
themselves within the boat, but viewed the boat and scenery 
as if they were on a stationary platform some distance to the 
rear of the boat. One subject said that he pretended to be 
playing with a toy boat in a pond and to be sighting along the 
prow of the boat as a means of observing shifts in background 
scenery while he moved the boat with his hand to the right or 
left and/or up or down. 

On the other hand, many, if not most, of the subjects pre¬ 
tending actually to be inside the boat, and using the prow as 
the guide, noted changes in background views with reference to 
corresponding motions of the boat. Although the test of Spatial 
Orientation appears to be weighted in both spatial-relations and 
visualization factors, it does seem to represent best a measure 
of spatial relations or spatial orientation and to vindicate its in¬ 
clusion with other tests in the battery which were selected to 
bring out the spatial factor. 

In the following list of five tests, the first three of which were 
chosen to yield evidence regarding the second hypothesis, load¬ 
ings of .34 or higher were found in rotated factors VII (S), I 
(Vz), andV(Vi): 


Tests 

(5) Spatial Orientation 

(10) Flags 

(11) Cubes 

(12) Spatial Visualization 

(13) Pattern Analogies 


Factor VII (S) Other Factors 

.58 .42 (Vz) 

■ 44 (-15 Vz) 

.43 (,2oVz) 

.44 .62VZ 

.34 .41 Vi(.24 Vz) 


The magnitude of the weights in factor VII for the tests of 
Spatial Orientation, Flags, and Cubes indicates that identifica¬ 
tion of the factor as spatial relations is psychologically meaning¬ 
ful, Despite the substantial loading of the visualization factor 



SPATIAL-RELATIONS AND VISUALIZATION FACTORS 209 

in the test of Spatial Orientation —a fact which has been ration¬ 
alized previously—the first hypothesis regarding the psycho¬ 
logical nature of spatial relations appears to have been upheld. 

Of passing interest is the loading of .34 in the spatial rela¬ 
tions factor appearing in the test of Pattern Analogies. In this 
factorially complex test, the presence of variance in the spatial- 
relations factor may have been due to the role of those changes 
in the design of complex figures, or patterns, which depended 
upon a rule involving the spatial order of parts. In the more 
difficult items of complex design it was usually helpful, if not 
necessary, to give specific attention to the spatial organization 
of the various geometric properties within each of the patterns 
appearing in the row. 

A second source for possible variance in the spatial-relations 
factor was that of the format of each item. Pattern A and pat¬ 
tern B, which stood in a left-right order on the page, corres¬ 
ponded to the order of pattern C and one of the five alternative 
responses. Having been exposed to spatial tests administered 
earlier, many of the subjects may have transferred techniques 
previously learned in solving other items to the task required 
in the test of Punched Holes. Thus, the influence of mental set 
may have been one important reason for the appearance of the 
loading in the spatial-relations factor. 

The results of the factor analysis seem to indicate that, in 
the main, the two hypotheses have been upheld. Two of the 
final rotated factors may be readily interpreted in terms of 
their weights in two groups of tests as representing the spatial- 
relations and visualization abilities that were hypothesized. 
However, the number of tests does not appear to be large 
enough to determine with confidence whether the abilities may 
be correlated to some degree. 

Much needed, indeed, are other studies to yield further evi¬ 
dence regarding the tenability of these two hypotheses. Al¬ 
though two recent empirical investigations (i, 14) have in¬ 
dicated that similar primary factors are obtained when the 
same, or nearly the same, batteries of tests are administered to 
groups chosen under different selective conditions, it is urged 
that other homogeneous samples in which such variables as 
age, level of educational attainment, occupational classifica- 



0,10 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


don, and sex membership are systematically varied be em¬ 
ployed to test the validity of the two hypotheses. Other hypoth¬ 
eses should be formulated regarding the psychological nature 
of the spatial domain and subjected to verification through use 
of specially devised tests and of other tests of known factorial 
composition. It is hoped that following more extensive research 
in the area of space and visualization relatively pure tests can 
be constructed 4 to measure the abilities identified and that 
such tests can be used with others of demonstrated merit to 
improve materially the degree of accuracy with which numerous 
complex criteria can be predicted. 

Summary 

The primary purpose of the study was to test the tenability 
of two hypotheses regarding the psychological nature of spatial- 
relations and visualization factors. A secondary purpose was to 
seek to identify certain factors found in the AAF investigations 
with certain of Thurstone’s primary abilities. Within a battery 
of fourteen tests, two groups of tests (three tests in each group) 
were included which appeared to reflect differences in the psy¬ 
chological processes associated with the spatial-relations and 
visualization abilities. In addition to the six tests expressly in¬ 
corporated within the battery to yield evidence regarding the 
validity of the hypotheses, eight reference tests of fairly well- 
known factorial content were included to aid in the identifica¬ 
tion of variance found in the six tests and to answer questions 
of identity of the Thurstone and AAF factors. 

Positive evidence for the hypotheses was to be considered 
attained if the two groups of tests defined separate factors and 
if none of the other eight tests was substantially weighted in 
factors unique to either group of tests. Moreover, none of the 
three tests in one group should contain large amounts of vari¬ 
ance in common with tests of the other group except to the 
extent that a given test might consist of items that reflected 
the presence of that factor which was defined in the main by 

4 Even if pure tests cannot be constructed for all factors identified in the spatial 
realm, means are available for attaining estimates of univocal factor scores through 
use of suppression tests (8). 



Spatial-relations and visualization factors 21 1 

tests of the other group. If a test did appear in one group that 
contained variance in the factor associated primarily with tests 
of the other group, a satisfactory rationalization of this finding 
would be required. 

Product-moment correlations computed from sets of scores 
of 360 students in the introductory course in psychology at 
Rutgers University were factored by Thurstone’s centroid 
method. Eight of these factors were rotated by graphical means 
to positions satisfying the criteria of positive manifold and 
simple structure. 

In the orthogonal system six factors were identified as verbal 
comprehension, numerical facility, perceptual speed, reason¬ 
ing, visualization, and spatial relations. In the main, the vari¬ 
ances associated with factors identified as spatial relations and 
visualization were confined to the respective groups of tests 
initially placed within the battery to bring out the factors. In 
only one test in each group of three tests were substantial 
amounts of variance found in both the visualization and spatial- 
relations factors, although the larger portion of variance was in 
the factor common to the group in which that test appeared. 

The presence of variance in these two factors was ration¬ 
alized for each of the tests. Introspective reports of the sub¬ 
jects revealed that in many items the psychological processes 
used involved both spatial-relations and visualization abilities 
as described in the hypotheses. The range of difficulty level of 
test items in one test also appeared to be an important reason 
for the appearance of two factors. 

In short, it may be concluded that the two hypotheses re¬ 
garding the psychological nature of visualization and spatial 
relations were confirmed. However, other research projects need 
to be carried out with a variety of samples before a dependable 
generalization can be made regarding the nature of these two 
abilities. Since there is some evidence of still other spatial 
abilities (3), some or all of which may be correlated, it is recom¬ 
mended that a conscientious attempt be made to formulate in 
operational terms new hypotheses and that new tests, having 
been constructed in harmony with the hypotheses, be factor 
analyzed along with other tests of established factorial con- 



212 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

tent, Once the area of space has been dependably and ade¬ 
quately mapped, attention can be directed toward building 
tests approximating pure measures of the identified abilities, 

REFERENCES 

1. Dudek, F, J. "The Dependence of Factorial Composition of 

Aptitude Tests Upon Population Differences Among Pilot 
Trainees. I. The Isolation of Factors." Educational and 
Psychological Measurement, VIII (1948), 613-633. 

2 . Fruchter, B. “The Nature of Verbal Fluency.” Educational and 

Psychological Measurement, VIII (1948), 33-47. 

3. Guilford, J. P. (Ed.) Printed Classification Tests , Report No. 5, 

Army Air Forces Aviation Psychology Program Research Re¬ 
ports, Washington, D. C,: U. S. Government Printing Office, 
1947. 

4. Guilford, J. P. "Factor Analysis in a Test-Development Pro¬ 

gram.” Psychological Review, V (1948), 79-94. 

5. Guilford, J. P. "Some Lessons from Aviation Psychology,” The 

American Psychologist , III (1948), 3-11. 

6. Guilford, J. P. "The Difficulty of a Test and its Factor Composi¬ 

tion.” Psychometrika, VI (1941), 67-77. 

7. Guilford, J. P. "The Discovery of Aptitude and Achievement 

Variables.” Science, CVI (1947), 279-182. 

8. Guilford, J. P. and Michael, W. B. “Approaches to Univocal 

Factor Scores.” Psychometrika, XIII (1948), 1-22. 

9. Guilford, J. P. and Michael, W. B. “Estimates of Factor Load¬ 

ings When a Test is Homogeneously Changed in Length.” 
Psychometrika, (to be printed). 

10. Guilford, J. P, and Zimmerman, W. S. "Some AAF Findings 

Concerning Aptitude Factors.” Occupations, XXVI (1947), 
154-159. 

11. Guilford, J. P. and Zimmerman, W, S. The Guiljord-Zimmerman 

Aptitude Survey. Beverly Hills, Calif.: Sheridan Supply Com¬ 
pany, 1947. 

12. Guilford, J. P. and Zimmerman, W. S. “The Guilford-Zimmer- 

man Aptitude Survey.” journal of Applied Psychology, 
XXXII (1948), 24-34. 

13. Kelley, T. L. Crossroads in the Mind of Man. Stanford Univer¬ 

sity: Stanford Univ. Press, 1928. 

14. Michael, W. B. "Factor Analyses of Tests and Criteria: A Com¬ 

parative Study of Two AAF Cadet Pilot Populations.” 
Psychological Monographs; General and Applied, 1949, No. 
298. 

15. Thurstone, L. L. Multiple Factor Analysis, Chicago: University 

of Chicago Press, 1947. 

16. Thurstone, L. L. "Primary Mental Abilities.” Psychometric Mono¬ 

graphs, No. 1. Chicago: University of Chicago Press, 1938. 

17. Thurstone, L. L. “The Perceptual Factor,” Psychometrika , III 

(1938), 1-17. 



SPATIAL-RELATIONS AND VISUALIZATION FACTORS 2,13 

18. Thurstone, L. L. and Thurstone, T. G. “Factorial Studies of In¬ 

telligence,” Psychometric Monographs , No. a. Chicago: Uni¬ 
versity of Chicago Press, 1941. 

19. Zimmerman, W. S. “A Simple Graphical Method for Orthogonal 

Rotation of Axes.” Psychometrika , XI (1946), 51-55. 

2o- Zimmerman, W. S. “Isolation, Definition, and Measurement of 
Spatial-Visualization Abilities.” Ph.D, dissertation, Uni¬ 
versity of Southern California, 1949. 



ON THE USE OF INTERACTIONS AS “ERROR TERMS” 
IN THE ANALYSIS OF VARIANCE 1 

ALLEN L, EDWARDS 
University of Washington 

I. 

Many psychological and educational experiments are con¬ 
cerned with two or more variables, each of which may be varied 
in two or more ways. When the variables are studied in all pos¬ 
sible combinations in the same experiment, the experiment is 
said to be of factorial design. 2 As an example, let us take an ex¬ 
periment in which three variables are involved, A, B, and C. 
Suppose that A is varied in three ways, B is varied in two ways, 
and C is varied in four ways. Then we shall have (3) (2) (4) = 
24 combinations of variables, each combination corresponding 
to a particular experimental condition. One replication of the 
experiment will thus require 24 observations and the 23 degrees 
of freedom available with one replication would be allocated in 
the following way: 


Sum of squares if 

Main variables A 2 

B 1 

C 3 

First order interactions: A X B 2 

AXC 6 

B X C 3 

Second order interactions: A X B X C 6 


If 240 subjects were available, then 10 could be assigned at 
random to each of the 24 experimental conditions. We would 
thus have 9 degrees of freedom within each of the experimental 
conditions or (9) (24) = 216 degrees of freedom for the varia- 

1 This paper is based upon a section of a manuscript which deals more extensively 
with problems of experimental design in psychological and educational research, 
I should like to acknowledge that I have incorporated into this paper the suggestions 
of Dr. Paul Horst, who served as a technical consultant on the manuscript. 

1 It is assumed that the reader is familiar with the treatment of the analysis of 
variance as given, for example, by Lindquist (6), McNemar (7), or Snedecor (8), 



“error terms” in variance 


115 

tion of subjects treated alike. The sum of squares for the 216 
degrees of freedom would be the pooled sums of squares within 
groups which would be used to derive the mean square for test¬ 
ing the significance of the main experimental variables, the 
first-order interactions, and the second-order interaction. 

In general, it may be said that whenever replication is present 
within the experimental design, the within-groups or mean 
square based upon replication is the appropriate error term 
against which to test the significance of all other mean squares. 
An exception to this rule, discussed in the next section, would 
be when the categories or classifications of one of the variables 
may be regarded as a random selection from the population 
being sampled. 

Let us assume that in the experiment described that the A 
variable corresponds to three instructors, the B variable to two 
methods of instruction, and the C variable corresponds to four 
schools. Each instructor teaches both methods and in each of 
the four schools. We shall assume that 60 subjects have been 
selected at random within each school to serve in the experi¬ 
mental groups. The complete analysis of variance of achieve¬ 
ment scores on a standardized test given at the end of the ex¬ 
periment would result in the following sums of squares with 
associated degrees of freedom: 


Sum of squares dj 

Instructors. 2 

Methods. 1 

Schools. 3 

Instructors X Methods. 2 

Instructors X Schools. 6 

Methods X Schools. 3 

Instructors X Methods X Schools. 6 

Residual within groups. . .216 

Total.239 


Let us further assume that all of the mean squares, obtained 
by dividing the sums of squares by the corresponding degrees 
of freedom, are significant when tested against the residual 
mean square within groups. This would mean, first, with respect 
to the main variables: that significant differences are present 
among instructors; that the two methods differ significantly; 
and that there are significant differences among schools. 












1 l 6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The interaction between instructors and methods, if signifi¬ 
cant, would mean that the differences among instructors are 
dependent upon the method used or that the difference be¬ 
tween the methods depends upon the instructor variable. A 
significant interaction between instructors and schools would 
mean that the differences observed among the instructors are 
dependent upon the schools or that the differences observed 
among schools are dependent upon the instructors. A significant 
methods and schools interaction would mean that the difference 
observed between the methods is dependent upon the schools or 
that the differences among schools are dependent upon the 
method of instruction. 

If the second-order interaction is significant, this would mean 
that the differences observed among instructors are dependent 
upon the methods and the schools; that the differences observed 
among the schools are dependent upon the instructors and the 
methods; or that the difference observed between the methods 
is dependent upon the schools and the instructors. 

Now, in view of a significant second-order interaction, our 
conclusions concerning the main variables consisting of schools, 
methods, and instructors, are somewhat limited. We know that 
there are significant differences present for these three variables, 
but we know also, from the significance of the interaction, that 
the difference observed, let us say, for methods, is to some ex¬ 
tent dependent upon the schools and instructors. 

If our interest is only in the two particular methods, the three 
particular instructors, and the four particular schools, involved 
in the experiment, then our analysis and the tests of significance 
of the various mean squares, using the residual mean square as 
an error term, are appropriate. Each mean square has been 
evaluated and the conclusions reached are definite. Examina¬ 
tion of the means for the various combinations of experimental 
conditions would probably reveal that in a particular school, 
one method is more effective than another, when used by a par¬ 
ticular instructor, and we could make recommendations ac¬ 
cordingly. 

II. 

In an experiment such as that described, however, our pri¬ 
mary interest may be in the difference observed between the 



217 


“error terms” in variance 

two methods of instruction which we have used. Furthermore, 
we may wish to make recommendations beyond the particular 
schools investigated. Can we say that a particular method will 
probably be more effective, on the average, for all schools, in¬ 
cluding those we have not actually investigated? 

Let us suppose that we have selected the instructors to repre¬ 
sent particular types or personalities or abilities. The three used 
in the experiment are definitely not a random sample from any 
defined population. Nor have we selected at random from any 
population of methods of instruction; instead, we have picked 
two particular methods for investigation. But it is possible that 
we might have made schools a random variable by selecting the 
schools at random from a defined population of schools for a 
given city, county, or school district. If this had been our in¬ 
tention, of course, we would undoubtedly have taken a larger 
sample than the four schools at hand. Let us suppose, however, 
that the schools have been selected at random. 

We now have the case mentioned earlier, where one of our 
variables may be considered a random sample from a defined 
population. In this sense the schools consist merely of replications 
of the experimental design in which the main variables are the 
instructors (varied according to type) and methods. Under this 
condition the highest-order interaction involving the random 
variable may be regarded as the appropriate error term for test¬ 
ing the significance of the next lower-order interactions. But 
before proceeding on this basis, another condition must hold 
true; the interaction must be significantly larger than the resid¬ 
ual mean square within groups. It cannot, of course, be smaller 
except by chance. If it is smaller, the residual mean square 
within groups should be used in testing the significance of the 
next level of interactions. 

Let us assume, in the present instance, that the second-order 
interaction is significant when tested against the mean square 
within groups. We now proceed to test the next level of inter¬ 
actions against the second-order interaction. Whichever ones 
of these prove not to be significant when tested against the 
second-order interaction may be combined with the second- 
order interaction to give us an error term based upon a larger 
number of degrees of freedom. 

Under the assumptions we have made, it is quite likely that 



^2x8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

if the second-order interaction is significant when tested against 
the residual mean square within groups, that some of the simple 
interactions will not prove to be significant when tested against 
the second-order interaction. The obvious reason for this is that 
the mean square for the second-order interaction will be larger 
than the residual mean square within groups. The F’s thus ob¬ 
tained, besides being based upon a smaller number of degrees 
of freedom, will be smaller than in the first instance. 

Let us suppose that only the simple interaction involving in¬ 
structors and methods is significant when tested against the 
second-order interaction. The non-significance of the interac¬ 
tion between methods and schools and the interaction between 
instructors and schools, of course, means that we no longer have 
any basis for inferring that the difference observed between 
methods is dependent upon the schools, or that the differences 
observed among the schools are dependent upon the methods. 
Similarly, the evidence would now indicate that the differences 
among instructors are not dependent upon the schools, or that 
the differences among schools are not dependent upon the in¬ 
structors. The sums of squares for these two interactions may 
be pooled with the sum of squares for the second-order inter¬ 
action, along with their associated degrees of freedom. The 
analysis would now take this form: 


Sum of squares dj 

Instructors. i 

Methods. i 

Schools. 3 

Instructors X Methods. i 

Pooled interactions. 15 

Residual within groups. 1 16 

Total.239 


Now, how shall we test the significance of the mean squares 
for instructors, methods, and schools? If we could assume that 
either instructors or methods constituted a random sample from 
a population of instructors or a population of methods, the in¬ 
structor and methods interaction might be considered an ap¬ 
propriate error term for testing the significance of the mean 
square for instructors and the mean square for methods. This, 
however, is not a plausible assumption. The appropriate error 










“error terms” in VARIANCE Q .19 

term is the pooled interaction mean square based upon 15 de¬ 
grees of freedom. It does include all of the interactions involving 
the variable which we have assumed to be randomly selected, 
schools. If we now test the mean squares for instructors, meth¬ 
ods, and schools, against the pooled interaction mean square, 
and, if they are significant, what conclusions can be drawn? 

It is the methods mean square that is of primary interest and 
its significance would indicate that the difference between meth¬ 
ods was not dependent upon, or could not be accounted for, in 
terms of differences in the schools. A similar statement could be 
made concerning the instructors if this mean square was sig¬ 
nificant. In view of a significant interaction between methods 
and instructors, however, it would still be necessary to qualify 
our recommendations; the difference between the methods is 
still dependent upon the instructors. But the means for the 
various instructors teaching the various methods could be ex¬ 
amined for whatever insight this might give us as to the nature 
of the interaction 3 . 

The analysis we have described is dependent upon a number 
of considerations and these should perhaps be emphasized once 
more. If the interaction or pooled interaction mean square is to 
be used as an error term instead of the residual mean square 
within groups, it should be larger than the residual mean square. 
If it is smaller, it is so only by chance. Furthermore, it is neces¬ 
sary that the categories of one of the variables in the experi¬ 
mental design be a random selection from the population being 
sampled 4 . In the experiment discussed, for example, it would be 
necessary for the schools to be selected at random from a defined 
population of schools. In this case, the categories of the ran¬ 
domly selected variable may be regarded as replications of the 
experiment, and there is some justification for the use of the 

3 What if all of the first-order Interactions had proved to be significant when tested 
against the second-order interaction? In this case, the interaction between methods 
and schools might be used to test the significance of the methods mean square, and 
the interaction between instructors and schools might be used to test the significance 
of the mean square for instructors. We should keep in mind that in following this 
procedure, our interest is in being able to generalize concerning the methods, for 
example, in the population of schools. 

4 This condition will not be met by argument after the experiment has been carried 
through to completion. For example, it would be illogical to argue that the two particu¬ 
lar methods of instruction selected for investigation have been randomly selected from 
a population of methods, 



220 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

interaction as an error term, instead of the residual mean 
square 5 . 

III. 

In some complex experiments, involving many possible com¬ 
binations of experimental variables and consequently many ex¬ 
perimental conditions, replication is not used and the sums of 
squares for the higher-order interactions are pooled, along with 
the degrees of freedom associated with them, to obtain an esti¬ 
mate of experimental error (residual mean square within 
groups). The mean square thus arrived at is used in the manner 
in which the mean square based upon the variation within 
groups has been used in the experiment described, i.e., as an 
estimate of the uncontrolled variation against which to test the 
significance of the other mean squares. 

An example of this design is to be found in an experiment by 
Crutchfield (3), in which five variables were each varied in three 
ways in an investigation of" behavior potentials.” Animals were 
placed in a pulling compartment in which there was a string 
arranged by pulleys to a food pan. By pulling on the string the 
animals could pull the food pan next to the compartment and 
thus eat. A friction device was used to increase or decrease the 
force required for pulling the food pan, and behavior was stud¬ 
ied under all possible combinations of the experimental vari¬ 
ables. 

Variable A was the length of the string attached to the food 
pan and this was varied by the use of 60 cm., 120 cm., and 240 
cm. lengths. Variable B was the force required to pull the food 
pan in on the training trials and this was varied by using a low, 
medium, and high setting of the friction device. Variable C was 
the number of training trials given the animals and this was var¬ 
ied by giving 30, 60, and 90 trials. Variable D consisted of the 
number of hours between the crucial test trial and the last 
feeding period. This was varied with intervals of 12 hours, 24 
hours, and 48 hours. The final variable, E, was the force re- 

s This, is the situation in experiments involving repeated measurements on the 
same subjects, where the interactions involving subjects are used to provide an estimate 
of experimental error under the assumption that the subjects have been randomly 
selected from a defined population. Some of these experimental designs are described 
by Grant (4), Brozek and Alexander (2) and Kogan (5). 



“error terms” IN VARIANCE 111 

quired to pull the food pan during the crucial test trial and this 
was varied in the same ways as during the training trials. 

By varying each of the five variables in three ways, a total 
of 3 6 = 243 combinations of the variables are possible. One 
replication of the experiment, assigning one animal to each ex¬ 
perimental condition, would thus require a total of 243 animals. 
Each additional replication would require another 243 animals. 
Crutchfield decided to forego any additional replications and to 
use as an error term a mean square based upon the higher-order 
interactions. 

Each of the experimental variables will be based upon 2 de¬ 
grees of freedom, accounting for a total of 10 degrees of freedom. 
The first-order interactions will each be based upon 4 degrees of 
freedom, accounting for a total of 40 degrees of freedom. The 
second-order interactions, each based upon 8 degrees of free¬ 
dom, will account for 80 degrees of freedom; the third-order 
interactions, each based upon 16 degrees of freedom, will account 
for 80 degrees of freedom; and the remaining 32 degrees of 
freedom will be associated with the fourth-order interaction. 
Crutchfield pooled the sums ol squares for all interactions be¬ 
yond the first-order along with their degrees of freedom to ob¬ 
tain as his estimate of experimental error a pooled interaction 
mean square based upon 192 degrees of freedom. 

IV. 

Assumptions are involved, of course, in the pooling of the 
sums of squares for higher-order, interactions and their asso¬ 
ciated degrees of freedom. In the first place, it is assumed that 
each of the mean squares corresponding to the higher-order 
interactions is an estimate of the same common population 
variance, i.e., the assumption of homogeneity of variance is in¬ 
volved. It is also assumed that this common variance would not 
differ significantly from the variance estimate obtained with 
replication. If the higher-order interactions are not significant— 
and without replication and a corresponding test of significance 
this must remain an assumption—then the mean square de¬ 
rived from these interactions will estimate the same variance as 
estimated by the mean square within groups. 

Under these conditions, the experimental variables, A, i?, C, 



121 educational and psychological measurement 

D, and E, may be tested for significance by the mean square 
based upon the higher order interactions. The significance of the 
first-order interactions may be tested in the same manner. If 
none of the first-order interactions is significant, this provides 
good evidence that none of the higher-order interactions will 
be significant and therefore justifies the use of the higher-order 
interactions as an error term. 

Let us suppose, however, that one of the first-order inter¬ 
actions, let us say, the interaction between variable A and vari¬ 
able B, turns out to be highly significant. If that is the case, then 
the mean square based upon the pooled sum of squares for all 
higher-order interactions is likely to be biased in the direction 
of overestimating the “pure” experimental error that would 
have been obtained from replication of the experiment. 

If the first-order interaction between A and B is significant, 
we should then isolate the sums of squares for the second-order 
interactions which involved these two variables. These sec¬ 
ond-order interactions would be A X B X C, A X B X D, 
and A X B X E. These sums of squares and their associated 
degrees of freedom would be subtracted from the pooled sum of 
squares and degrees of freedom for all higher-order interactions. 
Since each of the second-order interactions is based upon 8 
degrees of freedom, then the subtraction of the three second- 
order interactions mentioned would leave a pooled sum of 
squares based upon 168 degrees of freedom. The significance of 
the three second-order interactions in question could then be 
tested against the residual mean square based upon 168 degrees 
of freedom. 

It has been mentioned that homogeneity of variance of the 
higher-order interaction mean squares is also involved in pool¬ 
ing them to obtain a single estimate of experimental error. Each 
of the mean squares based upon a higher-order interaction 
might be found and the set tested for homogeneity of variance 
by means of Bartlett’s test (i). If the test of this hypothesis does 
not result in the rejection of the hypothesis of a common vari¬ 
ance, then the pooling of the various sums of squares and de¬ 
grees of freedom is proper. 

Although the procedure of using interactions as estimates of 
experimental error has been followed in much published re- 




“error, terms” in variance 223 

search, we should keep in mind that there is no substitute for 
replication. If there is an a priori reason for expecting inter¬ 
actions to be significant, a test, based upon replication, should 
be provided in the design of the experiment. If the interaction 
mean squares are significant, then their use as an estimate of 
the mean square that would have been obtained with replica¬ 
tion, the within-groups mean square, may result in an under¬ 
evaluation of the significance of the main experimental vari¬ 
ables. 


REFERENCES 

1, Bartlett, M. S. “Some Examples of Statistical Methods of Re¬ 

search in Agriculture and Applied Biology." 'Journal of the 
Royal Statistical Society Supplement , IV (1937), 137-170. 

2, Brozek, J. and Alexander, H, “A Note on the Components of 

Variation in a Two-Way Table.” American Journal of Psy¬ 
chology, LX (1947), 629-636. 

3, Crutchfield, R. S. “Efficient Factorial Design." Journal of Psy¬ 

chology, v (1938),, 339-346. 

4, Grant, D. A. “The Latin Square Principle in the Design and 

Analysis of Psychological Experiments.” Psychological Bul¬ 
letin, XLV (1948), 427-442. 

5, Kogan, L. S. “Analysis of Variance—Repeated Measurements.” 

Psychological Bulletin, XLV (1948), 131-143. 

6, Lindquist, E. F. Statistical Analysis in Educational Research. Bos¬ 

ton: Houghton-Mifflin, 1940, 

7, McNemar, Q. Psychological Statistics. New York: Wiley, 1949. 

8, Snedecor, G. W. Statistical Methods. (4th ed.) Ames, Iowa: State 

College Press, 1946, 



THE OBJECTIVE MEASUREMENT OF 
DYNAMIC TRAITS 

R. B. CATTELL, A. B. HEIST, P. A. HEIST and R. G. STEWART 
The Ergic Theory of Attitude Measurement 

It is disconcerting that psychologists have not yet found any 
more objective way of measuring an individual’s attitudes and 
interests than by asking him how strong they are. In 1935 the 
present writer demonstrated some degree of validity in measures 
of spontaneous attention and of memory, for matters of interest 
(3). But, apart from the work of Super (17) and one or two 
sporadic, incidental uses of these newer methods, the bulk of 
research has continued to concentrate on refinements of verbal, 
self-declaratory attitude and interest scales (12, 14), which, in 
the writer’s opinion, can never satisfy the need for scientific, 
behavioral objectivity and meaning. Even the applied psy¬ 
chologists working with polls and socio-economic attitudes have 
regretfully had to realize that what a man says is unpredictably 
different from what he does and sometimes, indeed, from what 
he said an hour before (14). The present research, and two 
studies reported elsewhere (8, 9), are attempts to follow up on 
a more adequate scale, and to expand in new directions the 
original statement (3) of design for objective interest measure¬ 
ment. 

Dynamic traits are divisible into ergs, or basic innate drives, 
on the one hand, and metanergs, or attitudes and sentiments, 
on the other (4, 5). The present study is concerned with 
attitudes, but, since the attitude is, in respect to modes of 
measurement, a prototype of all dynamic traits, the methods 
developed here have reference, and are applicable to, dynamic 
traits generally. 

An attitude needs to be defined initially by five aspects, 
which are summarized in the paradigm: 

“(1) In these circumstances (2) I (3) want so much (4) to do 
this (5) with that.” 


124 



MEASUREMENT OF DYNAMIC TRAITS 


225 


Here (1) defines the stimulus situation with reference to 
which the attitude is evoked, (2) the organism bearing the 
attitude, (3) strength of interest in the course of action indi¬ 
cated, (4) the kind of action indicated and (5) the object with 
which the attitude is connected. Sometimes (1) and (5) are 
the same. 

According to the ergic theory of attitude measurement (5) 
an attitude may be expressed, for purposes of analysis and 
calculation, as a vector quantity, in which the length of the 
vector represents the strength of desire for (interest in) the 
defined course of action, and its direction represents its dynamic 
composition. It assumes that ergic coordinates can be discovered 
and defined by appropriate factor analytic procedures so that 
by giving the direction of the attitude with respect to these 
coordinates we describe the extent to which various ergs, e.g., 
hunger, sex, self assertion, pugnacity, gain expression through 
the attitude in question. An attitude is thus not regarded, by 
the ergic theory, as adequately expressed by the existing con¬ 
vention of pro- and con- an object; for an attitude about an 
object is far richer than a single dimension can express and is 
better defined in terms of all those basic-drive satisfactions 
which the given action to the object produces. One can, of 
course, correctly speak of a pro-con scale with respect to a 
defined course of action, i.e., one already defined in direction, as 
above. But a person may utilize the same object for many 
different courses of action, so that for this reason, as well as 
because of the possibility of fuller understanding given by 
expressing the ergic composition of the course of action, it is 
psychologically meaningless to speak of being “pro” or “con” 
an object. 

The above discussion of basic theory is necessary if the 
meaning of the present experiments is to be understood and 
their findings properly applied. It leads to a formula for the 
strength of an attitude parallel to that used in the specification 
equation for expressing some particular skill in terms of primary 
abilities, (4) as follows: 

I a = SijEu Hr SijEn + • ■ ■ S n ,E n i + SjEji 
where I is the strength of interest of the individual i in the 
course of action defined by the attitude j. The S’s are the factor 



aa6 educational and psychological measurement 

loadings, which in this case we shall call the dynamic situational 
indices defining the extent to which the various ergs or drives, 
Ei, Ei, etc., are involved (for the average member of the 
population) in determining the course of action concerned. It 
is our purpose to measure I, the strength of interest, by more 
objective methods. The measurement of the d”s, i.e., the direc¬ 
tions of the attitude vectors, is described elsewhere (5, 8). The 
measurement of the strength of an attitude is thus a measure¬ 
ment of interest. An attitude is measured when we measure both 
interest and ergic composition, i.e., length and direction of the 
vector. 

Possible Approaches to Objective Measurement of Dynamic traits 

Considering an attitude as a dynamic trait, it is easy to 
perceive, from what is already known about psychodynamics, 
that there is a wide array of possible principles for the objective 
measurement of attitude strengths. The following will be briefly 
discussed here and the majority, those starred, will have their 
application to experiments described precisely. 

A. Criterion Methods, ( a ) Interactive. —By these are meant 
methods of measurement too long and difficult for routine test 
use, but which, when properly applied (19), supply data that can 
be taken as a true measurement of what is meant by interest— 
in objective, “interactive” (4) units—in the real life situation. 

* (1) Money. Fraction or absolute amount of the indi¬ 

vidual’s income that he spends on certain courses of 
action. 

* (a) Time. Fraction of the individual’s time that he gives 

to certain courses of action. (18) 

B. Criterion Methods, if) Solipsistic. —By these are meant 
methods of measurement dependent on introspection and self 
assessment but which, in the specially controlled circumstances 
of experiment with intelligent, cooperative subjects, can be 
used as criterion data. 

* (3) The classical “opinionaire” method, as used by Thur- 

stone (20) and others. 

* (4) The “ preference ” method, in which the individual is 

presented with alternate courses of action (attitudes) 
and asked which he would prefer to satisfy. This is 



MEASUREMENT OF DYNAMIC TRAITS 


227 


done in all possible paired comparisons among, in 
this case, 5° attitudes, and thus supplies a more thor¬ 
ough, pointed measure along the lines of (3). It is the 
same situation for human beings as that presented to 
animals in the classical "choice box” experiment on 
motivation strength (ai) except that the reaction is a 
verbal only. 

C. Attention-Memory ( Learning) Methods; ( a) in the immedi¬ 
ate situation .—These depend on the principle that interest (in¬ 
centive) is a determiner of attention, rate of learning, inhibitory 
effects on other processes, etc., and seeks to measure interest 
through such effects. 

(5) Attention time. Recording the length of time or the 
rank order in which the individual will spontaneously 
attend to various stimuli. 

* (6) Immediate Memory. Since there seems little point, as 

far as we know, in separating measures of “observa¬ 
tion” from “immediate memory,” this records instead 
of “ attention” the amount of various interest data 
recalled almost immediately after exposure. As indi¬ 
cated later, the measure was tried separately for state¬ 
ments facilitating the expression of the attitude and 
statements frustrating it. 

(7) Reminiscence. It would seem likely that reminiscence, 
the selective action of memory as determined by con¬ 
trasting immediate with more remote recollection, 
might be particularly correlated with interest. 

* (8) Distraction. This method aims at measuring the atten¬ 

tion effect indirectly by recording the failure to per¬ 
ceive surrounding material when the interesting object 
is presented. 

(9) Retro-active Inhibition. As with distraction, the interest 
an individual has for certain matters, particularly in 
the deeper interests, might be validly measured by the 
amount of retro-active inhibition their consideration 
exerts upon some prior, standard learning process. 

D. Methods Appraising Cognitive and Dynamic Structure due 
to Interests .—The methods under C depend on learning effects 
of interest in the immediate test situation „ but if we are willing 



128 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

to accept the slight error due to time lag we can measure 
interests alternatively by the effects they have had in the course 
of time upon information, skills and dynamic response habits. 

* (io) Information. This method tests the individual’s infor¬ 

mation about facts, devices, etc., necessary to imple¬ 
ment the course of action in which he is interested (not 
necessarily knowledge about the object). 

* (n) Speed of Decision (.Reaction time). This method as¬ 

sumes that decisions will be given more quickly for 
questions in regard to which the individual has more 
intense conviction. Preliminary work already indicates 
the probability of this. 1 

(n) Level of Skills. The extent of the built-up skills in a 
certain course of action may, like the level of informa¬ 
tion, provide a measure of the strength of interest 
therein, e.g., performance on a piano provides an index 
of musical interests, or skill in shooting of hunting 
interests. Time and errors in suitably chosen diag¬ 
nostic performances would thus provide a measure of 
this area. So also might speed of decision in a different 
context from (i) above, namely in that there would 
be, through practice, greater quickness in making de¬ 
cisions in those fields with which J is familiar, 

E. Autism Methods. —In research on so-called “projective” 
tests the present writer has pointed out (7) that devices in this 
area are more aptly called apperception tests (since such meas¬ 
ures include both cognitive and dynamic sources of distortion). 
Within the apperceptive class, however, we may distinguish 
autism tests, which deal with distortions of perception, reasoning 
and memory through dynamic traits alone. Ego defense dynamisms 
tests are a sub-category within autism tests. The autism methods 


1 Chant and. Salter (io), presenting an "attitude to war” opinionaire to a group of 
mainly pacifist subjects, found that items which demanded longer decision had a larger 
P.G.R. (0.72 ± .07), but that more “militaristic” items had larger P.G.R. and more 
neutral items a longer decision time (i.e., curvilinear relationships exist). What bears 
more simply on our approach is their finding that rejected statements had larger deflec¬ 
tions (.71 ± .16) and longer decision times. (Mean 2.6 ± .08 greater than accepted). 

At the reading of the present paper at the annual Mid-Western A.P.A. meeting in 
Chicago 1949, Gallenbeck (13) announced that he had results, but more finely analyzed, 
entirely confirming the relation between affirmative decision times and strength of 
convictions, presented here. The results of Postman (1$) are also in agreement with 
this use of decision time as a strength of attitude measure. 



MEASUREMENT OF DYNAMIC TRAITS 


2Q.9 


used to measure the dynamic traits of special significance to 
personality are obviously applicable to interests in general, 
though the defense dynamism tests are not so relevant. 

* (13) Misperception {Perceptual Autism or Illusion). —In 

this method defective sensory presentations (mainly of 
words) are made such that the individual may be 
tempted to apperceive them in accordance with his 
wishes. He is scored on the number misperceived to 
fit in with his attitude. 

*(14) False Belief (Reasoning Autism or Delusion). —The 
method presents a number of manipulatable state¬ 
ments of fact and logic so chosen that the individual 
with a strong attitude will experience a need to distort 
his factual beliefs in a certain direction better to 
support his attitude. 

* (15) Phantasy. This method treats phantasy in toto and 

not merely the defense dynamism forms. A measure of 
time spent phantasying or of choice of phantasy read¬ 
ing in presented alternatives is recorded. 

* (16) Projection (Defense dynamism). Two types of con¬ 

trolled, selective answer tests are possible in this area, 
(a) That in which the picture or the verbal statement 
of activity is fixed and the subject selects the best of 
the alternative dynamic “explanation” of the behavior 
(See design in (9)). (b) That in which the subject 
chooses the activities, from a presented list, of which 
he prefers to “explain” the motive. The latter is 
psychologically more complex but has not been tried 
and it was the especial interest of one co-worker to 
try it out here. (9) 

(17) Ego Defense Dynamisms. It is possible that any other 
defense dynamism, e.g., reaction formation, identifi¬ 
cation, rationalization, true projection, defensive 
phantasy could be used here, by methods described 
elsewhere (7), but such methods would be restricted 
by applying only to interests connected with ego con¬ 
flicts and were not tried out at this stage of ex¬ 
ploration. 

F. Activity Level Methods , (a) Psychological. —In this cate- 



0,30 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


gory, -which includes some relatively miscellaneous approaches, 
we include attempts to measure increases in the general ex¬ 
citement level of the organism due to arousal of interest by the 
stimulus in the experiment. 

* (18) Fluency. A measure of the sheer amount written, in a 

given time, in a “completion” test of statements con¬ 
cerning a given attitude. 

* (19) Speed of Reading. A method based on the hypothesis 

that an individual will read more rapidly material 
which interests him and which is in agreement with 
his own attitudes. 

* (20) Work-Endurance Measures. This method plans to 

measure work output (endurance of fatigue) or en¬ 
durance of pain or discomfort in the interest of various 
attitudes and is thus analogous to the obstruction 
method in animal motivation studies (2.1). Miniature 
situations involving satisfaction of the particular at¬ 
titudes could be made, for example, in terms of satis¬ 
faction of curiosity in reading about facts contribu¬ 
tory to the total attitude satisfaction. 

G. Activity Level Methods, ( 3 ) Physiological .—The known, 
promising methods of measuring increase in activity level are 
greater in the physiological field, where autonomic and meta¬ 
bolic measures have been more developed. 

* (21) Psychogalvanic Response. The percentage decrease in 

resistance was measured on exposure of statements 
favoring and opposing the given attitude. 

* (22) Pulse Rate. Difference of rate before and after pres¬ 

entation of stimulus defining attitude. 

(23) Metabolic Rate. A better measure, to which the above 
is only an approximation, would be the increase in 
metabolic rate following, in a discovered optimum 
period, the presentation of the attitude statements. 
Because of technical difficulties we had to be content 
with (22). 

(24) Muscle tension. There is evidence in the work of Duffy 
that general muscle tension is as sensitive and reliable 
a measure of conation as is the P.G.R. For lack of 
further work confirming the measurement of conation 



MEASUREMENT OP DYNAMIC TRAITS 23 1 

by this method, however, we eventually did not use 
general tension, but (25) below. 

* (25) Writing Pressure. The subject was asked to write 
"Yes” or "No” according to his reaction to presented 
attitude statements. A device beneath the writing 
desk measured the handwriting pressure he exerted in 
these responses. 

Twenty-five distinct methods of objective attitude measure¬ 
ments are suggested, above, to be of promise; but nine of 
them—(5), (7), (9), (12), (15), (17), (20), (23) and (24)—were 
not tried in the present experiment, some because of special 
technical difficulties, some because of similarity to methods 
already in the sample and some, namely (5), (15), and (17), 
because an idea of their effectiveness has already been gained 
from earlier research (3), (7), (u). Of the sixteen methods 
tried, twelve are described here and the rest elsewhere (9). 

Phe Experimental Design 

The proof of goodness of an attitude measurement method is 
valuable only if it applies to any kind of attitude. Consequently, 
it was our objective to design the experiment so that a wide 
range of methods could be applied to a sufficient sample of a 
wide range of attitudes. Twelve attitudes were taken, sampled 
from (1) those of massive importance in everyday life (and 
therefore of interest to clinicians), from (2) those sampling 
distinct basic drives and (3) those of different social and intel¬ 
lectual interest areas (such as have been of interest to social 
psychologists). The list was based mainly on the fifteen cate¬ 
gories of Cattell’s Interest Pest ( 6 ). 

The twelve attitudes chosen for experiment with the various 
measurement methods here described were actually adminis¬ 
tered to the group in a total set of fifty attitudes, in connection 
with an experiment described elsewhere (8). This inclusion in a 
large group gave certain advantages, notably, that the prefer¬ 
ence score could be the rank order in fifty attitudes rather than 
in twelve. The twelve attitudes are set out below according to 
their index numbers among the fifty (8). 

(1) I want to play more indoor sociable games, such as card 
games. 



232, EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(2) I want to spend somewhat more on drinking and smoking 
than I am now able to do. 

(6) I want to become proficient—if possible to excel my 
colleagues—in my chosen career. 

(10) I want more time to enjoy sleep and rest. 

(11) I want to listen to music. , 

(16) I want to know more science. 

(19) I want to see organized religion maintain or increase its 
influence. 

(22) I want to attend football games and follow the fate of 
teams. 

(30) I like to see a good movie or play every week or so. 

(34) I want to get my wife the clothes she likes and to save her 
from the more toilsome household drudgeries. 

(36) I want to be smartly dressed, with a personal appearance 
that commands admiration. 

(44) I want to feel that I am in touch with God, or some prin¬ 
ciple in the universe that gives meaning and help in my 
struggles. 

Upon these twelve attitudes the twelve methods of measure¬ 
ment set out below were tried. Four methods—(4), (6), (10) 
and (21)—were tried on all attitudes; two methods—(1) and 
(2)— were tried on seven attitudes; and the remaining, newer 
methods were tried on one attitude each. 

Brief List of Methods Examined Here (.Entirely New 
Methods in Italics ) 

(1) Money expended 

(2) Time expended 
{4) Preferences 

(6) Immediate Memory 
(8) Distraction 

(10) Information 

(11) Speed of Decision 

(13) Misperception ( Illusion ) 

(14) False Belief ( Delusion ) 

(18) Fluency 

(19) Speed of Reading 

(21) Psychogalvanic Response 

It was our aim to measure validity in terms of correlation with 
the pooled result of all methods. But from existing information it 
is likely that some methods are better than others and, indeed, 
six of the above methods, those in italics, are “long shots,” with 
no previous work on them whatever; so we decided to make the 
validating core out of the first six—hereinafter designated 
“tried” tests, because previous work has shown (1), (3), (11), 



MEASUREMENT OF DYNAMIC TRAITS 


2 33 


(16), (17)) (18), (19) some degree of validity. Also, we desired 
to know the relative goodness of these first six tried, tests with 
greater accuracy, whereas we were interested only as to whether 
there exists any validity at all in the exploratory (italicised) 
tests. It was for this reason that the tried test methods were 
applied to the majority of attitudes, but each of the exploratory 
methods was tried on one attitude only. 

The subjects were a population homogeneous as to sex (men) 
and chosen to have family interests (all were married) but 
otherwise diverse (some students, some business men) and 
ranging in age from 20-40 (80 per cent between 2£ and 33) so 
that though all possessed the attitudes in question they would 
do so in diverse degrees. Six methods (the "tried” methods) 
were applied to all subjects but not on all attitudes, for each 40 
subjects took a different pair of attitudes. The six exploratory 
methods were therefore each applied only to one attitude and 
40 subjects. 

A more detailed statement of the method of administration 
of the twelve methods follows: 

(1) Money Expended —(No. 1 in general list; used on all 
attitudes.)—'Two weeks a month apart and clear of any special 
holiday season, were taken and S was asked to record his 
expenditure on the particular interest activity concerned for 
the whole week. Reliability coefficients were calculated with 
respect to the two-week periods. 

(2) Time Expended —-(No. 2 in general list; used on all atti¬ 
tudes.)—In the same two weeks S recorded separately for each 
and at the time the number of hours spent in the given ac¬ 
tivity interest. (See (18).) 

(3) Preference —(No. 4 in general list above; used on all 
attitudes.)—A matrix of cells was constructed, constituted by 
the triangular area bounded by the full fifty attitudes arranged 
in rows on the right and in columns from right to left. Each 
cell thus represented a possible comparison of the strength of 
one attitude with that of another. S thus made 1225 paired 
comparisons, indicating in each case which of the two attitude 
goals concerned in the comparison he would rather satisfy. The 
score for a given attitude was the fraction of the 49 compari¬ 
sons in which it was the preferred member. 

(4) Immediate Memory —(No. 6 in above list; used on all 



124 - EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


attitudes.)—A series of 500 brief statements, 10 to an attitude, 
equally divided among those pro and con each of the attitudes 
were presented tachistoscopically at 6-second intervals. They 
were presented in a series of 42 discs, each consisting of 12 
statements randomly mixed from among the fifty attitudes. As 
examples of the five pro or facilitating and five frustrating 
stimuli used in connection with each attitude we may take 
from attitude 6 (wanting success in one’s career) the two state¬ 
ments “Success in career assures happiness" and “The success¬ 
ful careerman is always selfish." S was told at the beginning 
that after every 12 statements (and a pause of 25 seconds) he 
would be asked to recall, in 30 seconds, all that he could re¬ 
member of “the phrases, statements or ideas presented in the 
last period." Credit for recall was given when the essential idea 
of the item was re-iterated regardless of verbal form. This 
same situation and set of attitudes was used simultaneously to 
get the P. G. R. responses described below. 

(5) Distraction —(No. 8 in above list; used on attitudes 36 
through 40.)—Statements similar to the above were exposed, 
ten to each attitude but intermixed. £ was told he would be 
given 10 seconds to look at each statement and might be asked 
to repeat it (he was asked intermittently) as well as to recall 
the nonsense syllables scattered around the statement. Twelve 
or thirteen nonsense syllables were in the margins around each 
statement. S was given 10 seconds to write down above re¬ 
called items. 


(6) Information —(No. 10 in general list; used on all atti¬ 
tudes.)—Ten information items, each with multiple-choice 
selective answers, were presented for each attitude. The infor¬ 
mation dealt, not with the object (which would measure total 
interest in the object) but with knowledge required in following 
the course of action connected with the attitude. S was asked to 
leave no item unanswered but to guess. Scored on total number 
right, A typical example may be taken from attitude 22, on 
wanting to follow football games as a spectator: 

fOrangej (Georgia j 

“In the-j Sugar XBowl game of January 1st, 1948 j Michigan l 
(Cotton] S. M.U. 




MEASUREMENT OF DYNAMIC TRAITS 


2 35 


(7) Speed of Decision —(No. 11 in general list; used on at¬ 
titude No. 1.)—Ten questions were presented for each attitude. 
They were chosen to be such that all d”s would give some degree 
of affirmative answer, and 6”s were told to give an answer in the 
form “Probably/’ “Yes” or “Certainly,” i.e., definitely and 
emphatically yes. For example, “Do you want the sale of liquor 
to children to be prohibited?” This uni-directional response was 
necessary because previous research has indicated (15) that a 
short decision time is associated both with very affirmative and 
very negative responses. We need a question such that reaction 
time would work only in one direction, 

(8) Misperception (Illusion )—(No. 13 in general list; used on 
attitude No. 1.) —Ten attitudes statements, positively ex¬ 
pressing the attitude, were presented for each attitude, j was 
instructed to expect 1 second tachistoscopic exposures of sen¬ 
tences, to repeat what they said and to note any misspellings. 
Sentences were such as “I want to eat a chocholate sundea,” 
“I want to reduse my weight thruogh work.” Ten statements 
not connected with any dynamic need were presented as a 
control on d”s normal carefulness of spelling perception. 

(9) False Belief {Delusion) —(No. 14 in general list; used on 
attitudes 41, 4a, 43 and 44.)—Ten statements for each attitude 
were presented S as an “Information Test.” The five 
multiple-choice alternative factual endings to each statement 
were such as to give greater or less factual support to the 
attitude S might desire to maintain. Thus on attitude 44, 
“During the war church attendance increased greatly and since 
V-J day it has (declined slightly; tended to increase still more; 
stayed at its high peak; returned to its pre-war level; fallen to 
its lowest point since 192.0). 

(10) Fluency —(No, 18 in general list; attitudes 31, 32,33,34, 
35.)— S was shown each of the ten statements originally used 
to express each attitude and was told to write as much on the 
topic of each as possible in 1 minute. It was noted that this 
'fluency’ increased slightly but steadily in successive attitudes, 
so S was run through attitudes in both direction. At this 
administration no check was kept of relative fluency on pro and 
con statements. Score was total number of words produced. 

(11) Speed of Reading —(No. 19 in general list; tried on 
attitudes 14, 15, 1 6, and 17).—Six statements were presented 



1^6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

for each attitude, three favoring the attitude and three against 
it, but in random order. S was timed on reading statements 
aloud, the negative item speed being subtracted from the 
positive on the assumption that £ would read more rapidly 
those statements which expressed his desires. 

(12) Magnitude of Psychogalvanic Response —(No. 2.1 in 
general list; tried on all attitudes).—The P.G.R. was applied 
with the technical conditions described in earlier work by the 
senior author (2), the deflection being measured in percentage 
of the absolute resistance. For each attitude the deflection was 
taken to tachistoscopic exposures of five statements favoring 
the attitude and five opposing it, the instructions and exposed 
statements being those used in the Immediate Memory 'Test. 


TABLE 1 

Reliabilities of “Tried" Methods 















Mean 





Attitude Number* 






thro’ 


1 

2 

6 

10 

11 

16 

19 

22 

30 

34 

36 

44 

Z score 

(10) Information. 

.64 

■39 

■ 23 

,20 

.86 

.14 

.68 . 

Q2 

.90 

■31 

■36 

■ to 

•59 

(ai) Psychogalvan. 

■32 

■ 31 

•«4 

.98 

.96 

.84 

.88 . 

93 

.96 

. 80 

.70 

■ 6.3 

.85 

(4) Preference. 

.70 

.89 

.88 

.88 

.88 

.92 

.96 

87 

.67 

.90 

.92 

.98 

.90 

(6) Immed. Memory. . 

■32 

13 

■ 3 ° 

■ 47 

■S 3 

.21 

/ 

74 

.86 

-47 

.21 

■ 44 

• 5 ° 

(2) Time Exp. 

.92 

/ 

■38 

.96 

■97 

/ 

,98 

99 

• 7 i 

/ 

•94 

/ 

■94 

fl) Money Exp. 

/ 

.86 

.48 

.98 

.96 

/ 

•99 

99 

.96 

■84 

.67 

/ 

•94 

s 


* These correspond to the numbers in the complete description of fifty attitude 
n (8). 


Scoring was carried out for facilitating and frustrating sets 
separately and also for all together, as discussed below. 

Results 

As indicated above, the measurement of each attitude was 
split wherever possible into two sets of five items, in order to 
get a reliability; but, where the measures had first to be split 
into pro and con items, the reliability was reduced to two items 
against three. 

The reliabilities for test forms applied to all twelve attitudes 
and corrected to 10-item length are as shown in Table 1. 

For Immediate Memory with unfavorable statements (At¬ 
titudes 6 and 10) the reliability was .32; for facilitating state¬ 
ments, .45; for the Distraction measure (Att. .16), .64; for Speed 









MEASUREMENT OT DYNAMIC TRAITS 237 

of Reading (Att. 16), .79; for Misperception (Att. 2), ,43; for 
False Belief (Att. 44), .53; for Fluency (Att. 34), ,68; and for 
Speed of Decision (Att. i), .90. Apart from the methods of 
comparing the speed of reading of favorable and unfavorable 
views and the method of misperception of spelling, therefore, 
any failure of a method to attain recognizable validity cannot 
be imputed to any large extent to unreliability of the tests. 
These two methods, as well as the immediate memory method, 
however, evidently need improvement in items and procedure, 
to gain reliability sufficient for a more exact appraisal of 
validity. Information and P.G.R. could also be improved on 
certain attitudes which offer specific difficulties in test item 
design. For example, Attitude 10, “I want more time to enjoy 
sleep and rest,” evidently makes severe demands upon the 


TABLE a 

Validities of‘'Tried" Methods 



1 

2 

3 

4 

s 

6 

1. Time Exp. 

1. Money Exp. 







. 47 






3. Preference. 


■*5 





4. Information. 

.l6 


.16 




5. Immed. Memory. 

.OI 

■°3 

■ 13 

.OI 



6, Psychogalvan. 


.0+ 

■03 

.04 

■ if 


Row 1 Mean Validity in regatd to all. 

.19 

.19 

.16 

. IO 

.05 

.07 

Row 1 Mean for ist four methods. 

.28 

.29 

.21 

.16 

.02 

.05 

Row 3 Mean Reliabilities. 

. 94 

■94 

.90 

•59 

■ 50 

.85 


experimenter’s subtlety in choosing information items con¬ 
nected with this interest, for the ten items used attained a split 
half reliability of only .20. 

For the six methods used on all twelve attitudes, twelve 
correlation matrices were worked out, and averaged (cell by 
cell), by Fisher’s Z function, to give the values in Table 2 
for the mean in ter correlation of the different methods applied 
to a representative set of attitudes. 

No factor analysis has been attempted on so few variables, 
but what is substantially the loading of each method in the 
first general factor has been indicated by averaging its correla¬ 
tions with all other methods. This “internal validity” we shall 
take as the best basis for deciding the relative validities of 















238 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The calculation of the standard error on these correlations is 
somewhat complex. Each r in the body of the matrix is the 
mean of eight to twelve r’s on 40 men each. Since they are 
averaged through Fisher’s Z function the standard error of each 
would have s/N-3 in the denominator, so that the standard 
error of the mean would be equivalent to an r on a population 
of between (N-%) x 8 and (N-3) x 12, i.e., 296 to 4I4. However, 
the validation r’s are each the mean of five r’s each with the 
above standard error. The fact that the five latter represent 
independent experiments but not independent groups creates 
some difficulty, but assuming independence through experiment 
and applying the jV-3 denominator we arrive at from 1480 to 
2070 cases as the population on which the r’s in Row 1 are 
based. On this basis the validities of methods i, 2, 3 and 4 are 

TABLE 3 

Reliabilities and Validities of Exploratory Methods 


Method Method Method of Mean 

Relia- of Pref- of Infor- Money or of 

Method Attitude bility crence mation Time fit Money ail 


Speed of Decision. No. i .90 .09 .16 .16 .16 

Distraction. No. 36 ,64 .19 .35 .10&.08 .18 

Misperception (Illusion).No. a .43 .00 .13 .01 .06 

False Belief (Delusion).No. 44 .J3 .33 .11 .10 .18 

Fluency.. No. 34 .68 .2; .01 .03 .08 

Speed of Reading. No. 16 .79 .11 .06 .00 .05 


significant at the 1 per cent level, 6 at between 1 and 5 per cen* 1 
level and 5 barely at the 5 per cent level, though its correlations 
are consistently positive. 

The results for the six newer methods are set out in Table 3 
which shows, first, the reliability of the measurement and the 
attitude (Numbered as above) upon which it was tried; second, 
its correlations with the best four methods (1, 2, 3 and 4) above, 
and last, its mean correlation with all methods tried, usually six. 

Speed of decision, false belief and distraction are the only 
methods in which the pattern of correlations indicates some 
validity (at the 5 per cent level). Evidently the finding of 
Bruner (1) that misperception effects can arise from attitudes 
is one which shows up in differences of means but is not strong 
or constant enough to show up in the more exacting examina¬ 
tion by correlations and with methods of this kind. 













MEASUREMENT OR DYNAMIC TRAITS 


2 39 

Speed of reading seems unrelated to agreement with the views 
read and there is only a faint suggestion that fluency is related, 
though both show their highest correlation with the best 
method, namely Preference. These and the other newer methods 
are being tried out again, each on ten attitudes, since the pecu¬ 
liarities of a single attitude, as in the present research, may 
give an unfair impression. 

Certain possibilities in both the more basic and the more 
exploratory methods remain to be examined, notably (a) the 
possibility that higher validities will be found in ipsative (4) 
than with normative scoring, (b) the possibility that some 
relations are curvilinear, (c) the possibility that there are con¬ 
trasting effects not only between stimuli that have to do with 
an attitude and those that do not, but also between those that 
favor and those that frustrate the attitude. 

It will be remembered that ipsative scoring expresses the 
score relative to some average or total of the given individual , 
whereas normative scoring expresses it relative to the distribu¬ 
tion in the group (4). Where the raw score expresses some real 
interaction of the individual with his environment—some be¬ 
havior that may be considered a real function of interest, as the 
tests of information, time and money expenditures, etc., do—■ 
the present figures were scored normatively, i.e., in standard 
scores , before correlating. Preferences, the P.G.R. and the Im¬ 
mediate Memory tests, however, were scored ipsatively, for in 
the last case, for example, the immediate score is clearly relative 
to the individual’s standards. His intelligence and memory may 
be such that he exceeds the score of another person on a particu¬ 
lar attitude even though his interest in that attitude is quite 
small. In the second case individual physiological differences in 
reactivity (one person may have an average P.G.R. deflection 
five times as large as another) need to be corrected. The first 
method, preferences, is automatically ipsative in scoring, since 
each person has the same total. 

This is no place to attempt a discussion of the ipsative- 
normative scoring problem, which, however, must be recognized 
as peculiarly insistent in the field of interest measurement 
and has, for that reason, been fully discussed in a first approach 
to the theory of interest testing (3). There is as yet no simple 



■2.40 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

solution and indeed a claim can be made for putting almost 
any interest measure on an ipsative basis before putting it into 
normative scores. For example, the extent of the need ex¬ 
pressed in a money expenditure can only be properly gauged 
when we know how much money the individual possesses. 
However, in this dilemma we have thought it best to turn to 
ipsative scoring only when the individual differences in mean 
scores are patently great and when there are good reasons for 
believing that some personal constant, e.g., physiological re¬ 
activity or general power of immediate memory, mediates 
strongly between the behavioral expression of interest and the 
particular manifestation we have chosen to test. 

No digression comparable to the above will be taken into 
curvilinearity. Suffice it that one investigator (15) has shown 
that speed of decision is related to strength of conviction in 
bimodal fashion, a quick decision being made where attitudes 
are strongly for or against the question. A similar complexity 
has been found on the relationships of P.G.R. response and 
memory value (t6) and P.G.R. response and speed of de¬ 
cision. (10) 

However, in our correlation plots we have encountered no 
persistent curvilinearity, and with the exception of suggestions 
thereof in speed of decision, P.G.R. response, fluency and speed 
of reading, which require further investigation, we believe that 
there is no measurement problem in this respect. 

On the other hand, the problem of differences between the 
effects of statements favoring the successful expression of an 
attitude and those frustrating it is a very real one for certain 
methods, and in one method, the P.G.R., we had reason to 
believe that the poor validations obtained were due to the 
neutralization of two conflicting significant responses. Our 
search in this direction was stimulated also by the finding of 
Whately Smith (16) of a curvilinear relation between memory 
value of words and their P.G.R. deflection, such that the 
largest deflections were found both with words very well re¬ 
membered and words very poorly remembered. 

Consequently, in the ten items exposed both for the P.G.R. 
measures and for the Immediate Memory 'lest five were made 
“facilitating” and five “frustrating” items for P.G.R. and Im- 



MEASUREMENT OF DYNAMIC TRAITS 24I 

mediate Memory were separately scored and correlated. Owing 
to the complexity of the inter-relations and the lack of signifi¬ 
cance of some of the results only the positive indications will be 
briefly set out, as follows: 

Attitudes evoking larger deflections on facilitating items tend 
to have also larger deflections on frustration items than do other 
attitudes (r = .29 and .36) and the same occurs to a lesser 
extent in immediate memory measures. 

Larger deflections on facilitating items in an attitude are as¬ 
sociated with poorer immediate memory for that attitude, par¬ 
ticularly in its frustrating items ( — .25 and —.36). The impli¬ 
cations of the last statement, together with the Whately Smith 
findings, are clearly that both immediate memory and the 
P.G.R. have a more complex relation to interest than the 
simple linear one hoped for in this exploratory study. The bear¬ 
ings of this on further research are discussed below. 

Discussion 

Some observations not reducible to the above statistical 
digest need first to be added. These concern mainly the opera¬ 
tion of particular methods and can be presented seriatim. 

It was the general opinion of the experimenters that the 
reliabilities obtained for the expenditure of time and money 
methods were higher than the true dependability of the obser¬ 
vations warranted. Subjects, on close examination, were found 
to have been careless about their records of actual expenditures 
and to have made guesses, the similarity of which in the two 
weeks in question raised the apparent reliability. It is suggested 
that in further, more intensive experiments these records be 
kept in more detail and over longer periods than one week. In 
two attitudes, notably that dealing with expenditure on the 
wife and among students with very restricted means, some 
experimenters noted a curious tendency to inverse relationship 
between the amount spent and the stated intensity (Preference 
score) of interest. This general problem of the tendency of 
conscious, verbal intensity to be related to the extent of the 
frustration of the need rather than to the basic amount of need 
satisfaction occurring in the given attitude justifies special 
investigation. 



24.2 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

In the Immediate Memory 'Test the impression of experi¬ 
menters while administering it was that it was not working very 
well. The usual positional effects were noticed (first and last 
in each run of 12 being best remembered) but these were can¬ 
celled as far as possible by giving each attitude an equal 
positional chance. Since briefer items were apparently more 
frequently remembered it is suggested that future work attempt 
to bring all stimulus statements to five-, six- or seven-word 
length. There is some evidence, additional to that implicit in 
the above correlations, that good validity could be obtained for 
a memory method concentrating on failure to remember frus¬ 
trating statements. In one attitude r’s of .40 with Preference 
and .14 with Time and Money were obtained for this “memory 
failure with contrary statements” score. 

Both in the memory test and in the P.G.R. some distortions 
were produced by items which were unintentionally embarrass¬ 
ing or amusing, and subjects were suspected of not repeating 
the former even though they remembered them. Experimenters 
also suspected that dynamic effects, both in memory and 
P.G.R., tended to spread from a particular item to the items 
that happened to be neighbors. Some of the poorness of validity 
of the P.G.R. test was believed by most experimenters to arise 
from purely technical difficulties, e.g., change of meaning of 
the size of deflection with different absolute resistances, so that 
improved apparatus, such as the self-recording and more accu¬ 
rately balanceable instrument since constructed, is expected to 
yield validities equivalent to the other methods. It is also sug¬ 
gested that one or two “buffer” items be introduced before each 
run of a dozen or so stimuli, since it was noted that the first 
items after an interval tended, regardless of significance, to 
produce appreciable deflections. 

However, the use of the P.G.R. and Immediate Memory 
Methods can never be satisfactory until the problem of the 
relative significance of responses to “ facilitating-frustrating” 
stimuli, involving the above mentioned Whately-Smith effect, 
has been cleared up. The senior author believes that the current 
use of the P.G.R. could best be improved by using solely nocive 
(a specific variety of frustrating) stimuli and counting the 
response as a true function of the strength of the attitude 

threatened. 



MEASUREMENT OF DYNAMIC TRAITS 243 

Improvement of the promising'Distraction’ test is suggested 
through employing memorizing material more finely divisible 
and easier to remember than nonsense syllables. Numbers 
would be one such medium. 

In the equally promising Speed of Decision method it is 
possible that some useful compromise method might be worked 
out in which the extent of the subject’s stated agreement or 
disagreement would be taken into account as well as his de¬ 
cision time. This would bring the advantage that questions 
inviting negative answers could also be used and the experi¬ 
menter would not need to strain his ingenuity seeking questions 
that admit only of various degrees of positive answer. The rela¬ 
tion of decision time to degree of positiveness found in this 
method (for attitude No. i) is shown in Table 4. 

Although the above relation might not represent a correla¬ 
tion of more than .10 or .2,0, the combination of a speed score 


TABLE 4 

Relation of Speed to Positkeness oj Decision 




Reaponae 



Probably 

Yes 

Definitely 

Times response given for 40 subjects. 

. 315 

442 

443 

Average seconds per response. 

. a -9 

2.2 

r .7 


with a degree-of-assent score should reach an appreciably higher 
validity. 

So much for special methods. In the experiment as a whole 
the chief weaknesses resided in: (1) the great demand on the 
subject’s time, which tended to produce fatigue and boredom 
inconsistent with good cooperation; (2) the multiplicity of ex¬ 
perimenters (seven different people in various aspects of the 
undertaking); (3) the defectiveness of individual test items, 
notably in the Information, Immediate Memory and Misper¬ 
ception tests, due to absence of item analyses. 

The first is unavoidable, except with expensively hired sub¬ 
jects, if many methods are to be cross-validated in a widely 
planned exploratory study, but need not interfere in the more 
restricted local studies that can now be carried out with the 
knowledge here presented as to the general field. The second 
may be a blessing in disguise: if a method is such that it yields 









1AA EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

valid results in the hands of several experimenters one may be 
sure that it is a well-defined method and one valid in many 
circumstances. The third raises the general problem of whether 
item analyses should be carried out before or after the validity 
of a certain type of test has been established. The writers be¬ 
lieve that in exploratory studies the items should be designed 
on a sufficiently clear general principle. If this proves to have 
any validity the less valid items can later be combed out by 
item analysis (consistency with the test as a whole). 

The above considerations may indicate why the validity co¬ 
efficients of some methods have been called “acceptable and 
promising,” even though the correlations, significant at only 
the 5 per cent level, are still short of what would normally be 
considered good validity. Our first aim was an exploration to 
discover new methods of any real validity. The second aim, of 
improving them to ■practicable validity, can be predicted to 
encounter difficulties in certain cases. For when corrected for 
attenuation by low reliability the correlations with the criterion 
for most of the above methods still hover only between 0.3 and 
0.5, and we accept the position of Guilford that in psychometry 
validities below 0.5 are not of much practical use. 

However, the improvements indicated above are likely to 
raise the validity more than the reliability, and it is, moreover, 
possible that the present reliabilities, as indicated above, are 
overestimated for certain tests. Nevertheless, even if it be sup¬ 
posed that the validities of the separate methods could never 
be raised above 0.5, a very acceptable and effective battery 
could be made from a combination of half a dozen of these 
methods. For apparently only to a slight degree accounted for 
by error, what is the specific element in each? Most likely it is 
a combination of (a) other dynamic traits partly determining 
interest in the specific items chosen to represent the attitude, 
(b) individual abilities and temperamental qualities affecting 
the given medium of measurement, e.g., power of memory in 
the memory test, autonomic reactivity in the P.G.R., (c) life 
circumstances which cause certain expressions of the attitude to 
be unused or inhibited in certain persons. 

There is obviously much scope for research here, both on the 
sources of chance error in our measurements, i.e., on determin- 



MEASUREMENT OF DYNAMIC TRAITS 


H5 

ing the physiological, instrumental and other causes of low re¬ 
liability of the dynamic measurement, which, for the moment, 
we have brushed aside as “chance error,’’ as well as on the more 
systematic specific factors discussed in the last paragraph, but 
our interest at this stage has been to pursue the element of real 
validity, leaving the causes of non-validity for later examina¬ 
tion—wherever some validity is found. 

These last considerations raise a question to which both space 
and the roughness of data compel us to give only a tentative 
answer here. By taking validity as the mean correlation with 
the pool, we have implicitly assumed that only a single general 
factor is of importance. There is, however, some indication of a 
less clearly developed block of intercorrelating methods, addi¬ 
tional to the main block, including time and money expendi¬ 
tures, preference and information. It shows itself best in the 
correlations for one or two particular attitudes (6 and io) where 
a significant cluster appears in Immediate Memory (failure to 
remember statements contrary to the attitude), Preference and 
Projection (averaging .36 and .17 in the respective attitudes) 
test and slightly in Information and P.G.R., but scarcely at all 
in time and money expenditures or memory for favorable, 
facilitating statements. This may be that special aspect of an 
attitude strength represented by unsatisfied drive, but until 
further studies confirm the pattern discussion would be pre¬ 
mature. 


Conclusions 

1. From the administration of tests of attitude strength 
(“interest in a defined course of action”) involving twelve 
different methods, applied to most of twelve different attitudes, 
the mean reliability of each method and the mean correlation 
of each method with the other methods was obtained. 

2. The reliabilities varied from moderate to good, but only 
eight of the methods had validities that were significant. 

3. The validities were defined as the mean correlation with a 
pool of four or six “tried” methods, which were set aside at the 
beginning as psychologically sound criteria and of some tested 
worth. These were:—Expenditure of Money, Expenditure of 
Time, Stated Preference in Paired Comparisons, Information 



a.46 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Implementing a Course of Action, Immediate Memory and 
Psychogalvanic Response to statements concerning the atti¬ 
tude. Only the first four of these reached incontrovertible 
validities. 

4. The comparative failure of Immediate Memory and the 
P.G.R., despite good previous indications, seems traceable to 
complex relations, notably the Whately-Smith effect, differ¬ 
entiating the Memory response and the P.G.R. response (but 
in different ways) respectively to facilitating and frustrating 
verbal stimuli. 

5. In several methods where the interest response is mediated 
by the extent of the individual’s possession of some secondary 
personality factor large differences appear between the average 
magnitudes of the individuals’ mean responses to all attitude 
interests and it is then necessary to rescale the score ipsatively 
before correlating. 

6. Among the more "tentative” methods, which were cor¬ 
related with the core of “tried” methods on one attitude each 
(but not with each other), the reliabilities were of the same 
satisfactory order. Promising validities were found for the 
methods of Distraction, False Belief and Speed of Decision, 
suggestions of validity were found for Fluency, while Misper¬ 
ception (Illusion) and Speed of Reading had no validity. 

7. All of the tests were very short (10 items each), the 
purpose of the investigation being only to pick out, among an 
array of new psychological approaches, those possessed of any 
validity at all. Lengthening of the tests would raise the val¬ 
idity of six of them to about 0.5, of four others to .3 or .4. 
Item analysis might raise it somewhat more, but the over-all 
results seem to indicate that some real specifics are neces¬ 
sarily being measured by the specific methods and that a satis¬ 
factory objective measure of an attitude will only be obtained 
by a battery employing four to six different methods. 

8. Various results, notably the existence of a cluster among 
some methods on the fringe of the main cluster, give slight 
indications that there is some functional separation of that 
part of the strength of an attitude which arises from its 
frustration. 

9. From the experience of the four experimenters in the de- 



MEASUREMENT OF DYNAMIC TRAITS 


247 


sign and conduct of the experiment some suggestions for im¬ 
provement when carrying the research further are offered. 
Together with the methods explored in an extension of this 
research (8) the present methods constitute a set of eight new 
methods (Information, Immediate Memory, Preference, Speed 
of Decision, False Belief, Psychogalvanic Response, Projection 
and Distraction), additional to the criterion methods of Money 
and Time Expenditure and the classical Opinionaire (which 
they equal in validity), available for further use. Two directions 
of research now open up: (a) the improvement of the above 
valid methods by concentration on each technique singly, in 
relation to a standard validating core, (b) the exploration of the 
nine untried methods (Nos. 5, 7, 9, 12, 15, 17, 20, 23 and 24 in 
the primary list above) described in this same theoretical 
scheme. 

Since the successful contribution of psychology to the much 
needed integrating studies in the social sciences, with economics, 
anthropology and sociology, depends to a large extent on the 
psychologists’ ability to supply objective and accurate means of 
measuring strength of motive, interest or attitude, i.e., of dy¬ 
namic traits generally, it is to be hoped that the present 
exploration will be a foundation and stimulus for vigorous re¬ 
search in this area. 

The writers wish to express their gratitude to the Graduate 
Research Board of the University of Illinois and to the Social 
Science Research Council for funds contributing to the com¬ 
pletion of this research. 


REFERENCES 

1. Bruner, J. S. “Value and Need as Organizing Factors in Percep¬ 

tion." Journal of Abnormal and Social Psychology , XLII 

(i947), 33-44- „ . , 

2. Cattell, R, B. “Experiments on the Psychical Correlate of the 

Psychogalvanic Reflex.” British Journal of Psychology, XIX 
(1929), 357-386. , t 

3. Cattell, R. B, “The Measurement of Interest. Character and 

Personality, IV (1935), 147-169. 

4. Cattell, R. B. The Description and Measurement of Personality. 

New York: World Book Company, 1946. 

5. Cattell, R, B, “The Ergic Theory of Attitude Measurement." 

Educational and Psychological Measurement, VII 
(1947), 221-246. 



248 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

6. Cattell, R. B. A Guide to Mental Testing. London: Univ. of Lon¬ 

don Press, 1948. _ 

7. Cattell, R. B, “Principles of Construction of Apperceptive or 

Projective Tests of Personality.” Chapter 2, Projective 
Methods (H, H, Anderson, ed.) New. York: Wiley, 1949, 

8. Cattell, R. B. " Ergic Structure in Man as Inferred from the Meas¬ 

urement of Attitudes. (In press.) 

9. Cattell, R. B., Light, B., Maxwell, E. and Unger, M. “The Ob¬ 

jective Measurement of Attitudes.” British Journal of Psy¬ 
chology. (In press.) 

10. Chant, S. N. F. and Salter, M. D. "The Measurement of Attitude 

Toward War by the Galvanic Skin Reflex.” Journal of Edu¬ 
cational Psychology, XXVIII (1937), 281-289, 

11. Colman, R. D. and McCrae, C. R, “An Attempt to Measure the 

Strength of Instincts.” Education , V (1927), 171-181. 

12. Droba, D, D. “Methods of Measuring Attitudes.” Psychological 

Bulletin , XXIX (1932), 309-323, 

13. Gallenbeck, C. Systematic 'Analysis of the Characteristics of Think¬ 

ing and Belief. (In press,) 

14. McNemar, Q. “Opinion-Attitude Methodology.” Psychological 

Bulletin , XLIII (1946), 289-374. 

15. Postman, L. and Zimmerman, C. "Intensity of Attitude as a 

Determinant of Decision Time,” American Journal of Psy- 
, chology, LVIII (1945), 510-518, 

16. Smith, W, W. The Measurement of Emotion. London: Kegan 

Paul, 1922, 

17. Super, D. E. and Roper, E. S, “An Objective Technique for 

Testing Vocational Interests.” Journal of Applied Psychol¬ 
ogy, XXV (1941), 487-498, 

18. Thorndike, E. L. “How We Spend Our Time and What We Spend 

It For.” Science Monthly , XLIV (1937), 464-469. 

19. Thorndike, E. L. “What Do We Spend Our Money For?” Science 

Monthly , XLV (1937), 226-232. 

20. Thurstone, L. L. “The Theory of Attitude Measurement,” Psy¬ 

chological Review, XXXVI (1929), 221-241. 

21. Warden, C. J. Animal Motivation. New York: Columbia Univ. 

Press, 1931. 



THE CONSTRUCTION AND VALIDATION OF A 
WORK-TYPE AUDITORY COMPREHENSION 
READING TEST 

GEORGE SPACHE 
Chappaqua, New York 

We believe that there is a need for a test to determine the 
potential ability of students to comprehend and use high- 
school and college-level reading materials. This test should be 
relatively free from the influence of intelligence, as commonly 
measured, and independent of the influence of any reading 
difficulties of the individual. It should serve to indicate the 
possible performance level in silent comprehension and auding 
abilities. In our opinion, such a test would replace the use of 
common intelligence tests in estimating potential reading abil¬ 
ity 

Such a test would be preferable to the use of an intelligence 
test because the latter is not necessarily a good indicator of po¬ 
tential reading performance. Intelligence is itself a potential 
which is not achieved to equal degrees in all areas of communi¬ 
cation. There is no good reason why an intelligence test should 
be very closely related to reading ability or more significantly 
related to comprehension than to writing or speaking skills. 
We see no reason why one measure of potential general ability 
should be the best estimate of probable performance in many 
specific skills. 

A second reason against the use of intelligence tests to pre¬ 
dict reading comprehension is the extent of common content in 
such tests. Many intelligence tests actually function as reading 
tests and their results are merely a measure of reading status 
rather than an estimate of future or possible performance. 

Finally, intelligence tests do not function as accurate meas¬ 
ures of potential reading skill because reading performance is 
not dependent solely upon intelligence. Such factors as ex¬ 
posure to reading materials, socio-economic status, attitudes 


149 



250 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


toward reading, etc., definitely influence reading performance. 
These are sufficient to explain many of the observed discrepan¬ 
cies between intelligence and reading test results. 

For these reasons, we have attempted to devise a pair of 
comparable tests that would determine present reading com¬ 
prehension status and the potential ability of the student to 
improve his silent comprehension. The tests were arranged to 
parallel each other by selecting comparable passages from com¬ 
mon high school and college texts in science, literature and 
social science. Two forms of each test comparable in length, 
difficulty and types of reading passages were constructed. The 
Silent Comprehension Test requires the pupil to read the pas¬ 
sages and to answer questions in the usual manner. In the 
Auditory Comprehension 'Test, passages and questions are read 
to the student. Thus, we obtain comparable measures of per¬ 
formance and potentiality. 

Possible uses of these tests are numerous. 1 The present status 
of an individual in ordinary silent comprehension can readily 
be determined. With this knowledge it is possible to detect the 
extent of comprehension difficulties. The use of the auditory 
type of test would indicate whether ordinary remedial pro¬ 
cedures, or specific training in auding skills (as auditory vocabu¬ 
lary, organizing and summarizing, taking notes, etc.) were 
necessary or likely to be profitable. To be specific, low scores in 
silent comprehension in the presence of average or better audi¬ 
tory comprehension would indicate that common remedial tech¬ 
niques would probably be profitable. Low scores in both tests 
would indicate a degree of low potential for high-school or 
college work not likely to be improved except by extensive and 
prolonged remedial help. Average or better scores in silent 
comprehension with low auditory comprehension would indi¬ 
cate the need for special training in auding or auditory skills, 

The results in terms of total score on the first edition of the 
Auditory Comprehension Test were correlated with other sec¬ 
tions of the Diagnostic Reading Test battery as well as measures 
of intelligence and reading. 

1 These tests may now be obtained from Dr. Frances 0 . Triggs, 419 West 119th 
Street, New York tj , N. Y, They are published by the Committee on Diagnostic 
Reading Tests, a non-profit corporation devoted to the study and improvement of 
reading procedures. 



AUDITORY COMPREHENSION READING TEST 2$ I 

In view of a reliability coefficient (S-B) of .788 for this edi¬ 
tion, these correlations would seem to support our hope that 
this test would be a measure of factors operating in reading 
comprehension. The relationships with the measures of silent 
comprehension are of the order of 5; those with vocabulary and 

TABLE 1 

Relations of Scores on Auditory Comprehension Test to Various Other Measures 

Auditory Comprehension—Terman McNemar IQ. 358 

—Cleveland Reading—Vocab. 358 

Comp. 512 

—Diagnostic Reading—Vocab. 675 

Gen. Read. Rate. 167 

“ Comp. 493 

Social Studies Rate. 299 

“ “ Comp. 58a 

Word Attack. 400 

Oral Reading (errors). 177 


TABLE a 

Intercomlation Matrix and Reliahilities ( K-R si) of Total and Part Scores on the 
Diagnostic Reading Tests, Section II, Comprehension Part 2, Auditory Form A 

(N = 162) 


Coefficients of Correlation 


2 3 4 5 6 

Social 

Sciences 

Main Conclu- Physical and 

K-R Ideas Details sions Science Literature 
21 (47 Items) (63 Items) (25 Items) (47 Items) (88 Items) 


K-R 21. .71 -38 -48 .43 .66 


1. Total Score 

No. ofltems 13J.73 .86 .58 .72 .78 .92 

2. Main Ideas 

No. of Items 47.72 .23 .62 .64 .82 

3. Details 

No. of Items 63.38 .15 .5^ -5 2 3 4 

4. Conclusions 

No. of Items 25.48 .54 - 7 * 1 

J. Physical Science 

No. of Items 47.43 .56 


intelligence of the order of 3, with the exception of that with the 
Diagnostic Vocabulary Test] while those with rate, word analy¬ 
sis and intelligence range from 4 downward. 

Our finding that there is a significant relationship between 
silent comprehension and the comprehension of material read 
to a student is similar to the results obtained by Swanson and 























EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Anderson. 2 These authors also found that results in the two 
situations tended to be markedly similar. 

A second edition attempted to differentiate questions into 
subgroups determining the comprehension of main ideas, de¬ 
tails and conclusions. Questions on social science and literature 
were also distinguished from those in physical science in the 
hope that comprehension in these types of questions and sub¬ 
ject matter could be measured. Unfortunately, the intercorrela¬ 
tion matrix of part scores did not support this attempt. 

Reliabilities of the subscores range from .38 to .72 and the 
intercorrelations of sub-sections from .15 to .82. With the possi¬ 
ble exception of Main Ideas, none of the subscores is sufficiently 
reliable to justify its distinction. Item validities ranged in 
median values from 21.1 for Details, to 40.6 for Main Ideas and 
27.1 and 28.5 for Forms A and B, respectively. Median-item 
validities for Social Science and Literature and for Physical 
Science questions were 26.3 and 22.7. With this evidence, no 
attempt was made to differentiate types of questions or sub¬ 
ject matter in the final revision. 

Before undertaking the third and final revision, we thought 
it desirable to investigate the influence of chance and informa¬ 
tional background upon scores in the Auditory Comprehension 
'Test. We have often felt that many questions in other reading 
tests could be answered by a student without ever reading the 
test material. In fact, we confirmed this impression in a study 
of another test in the Diagnostic Test Battery. In a measure of 
silent reading comprehension, we believe that this situation 
would be highly undesirable, since it would vitiate the attempt 
to measure comprehension in a specific body of reading mate¬ 
rials. Since the purpose of the Auditory Comprehension Test is 
to measure potential, and not performance, in reading, the 
fact that the student may be able to answer a number of ques¬ 
tions even though he has not read the test material does not 
invalidate the test. If we are to measure potential, then the in¬ 
fluence of reading backgrounds and information should be al- 


5 Swanson, D. E., and Anderson, I. H., "A Comparison of Comprehension Scores 
Obtained from Silent Reading, Oral Reading and Auditory Comprehension.” Unpub¬ 
lished research as quoted by D. E. Swanson, in “Common Elements in Silent and 
Oral Reading,” Psychological Monographs, XLVIII, (1937) 36-60. 



AUDITORY COMPREHENSION READING TEST 


lowed to operate to a reasonable degree since they are contribu¬ 
tors to this potential. 

A group of 33 high-school pupils whose socio-economic status 
and intelligence were relatively high were able to answer a 
median of 58 per cent of the questions correctly. We do not 
know whether this figure should be greater or less, since we 
know of no comparable data. It would imply that about half 
of the questions of the Auditory Comprehension Test can be 
answered on the bases of intelligence, reading background and 
the other factors that influence reading skills. The remainder 
of the questions are, presumably, dependent upon the ability 
to comprehend specific high-school and college textual mate¬ 
rials. Thus the test may be measuring potential both by sam¬ 
pling the capacity for understanding a group of selections from 
common texts and by measuring the facility in using reading 
or informational experiences. 

The third and final editions of the Auditory and the Silent 
Comprehension tests were based on these experiences with the 
two preliminary editions. The parallel nature was preserved 
and the similarity between the tests increased by making them 
of the same length. The final editions are composed of approxi¬ 
mately 50 items and require about one class period for admin¬ 
istration, We believe that judicious use of the tests will make 
possible comparisons between present status and potential per¬ 
formance in reading and auding, as well as a prognosis of the 
probable outcome of remedial help or training in auding skills. 



VALIDATION AND STANDARDIZATION OF THE AGO 
GENERAL MECHANICAL APTITUDES TEST FOR 
THE SELECTION OF CIVILIAN EMPLOYEES IN 
WAR DEPARTMENT INSTALLATIONS 1 

ADAM PORUBEN, JR. 

Metropolitan Life Insurance Company 


During World War II, the Civilian Subsection, of the Per¬ 
sonnel Research Section, The Adjutant General’s Office, was 
engaged in the construction, standardization, and validation 
of various aptitude tests for the selection and placement of 
civilian personnel in various War Department installations. 
The General Mechanical Aptitudes Test was one of these tests. 
It was derived from four tests that already had shown some 
validity for the selection of employees for mechanical jobs. 
The study here reported was carried out in 1945. The writer, 
who was on the staff of the Civilian Subsection, was assigned 
this particular subject because he had been a teacher of Related 
Mathematics and Sciences for several years in the Saunders 
Trades School, Yonkers, N. Y., where this test was tried out, 
and he was, therefore, in a better position to evaluate the re¬ 
liability and validity of the criterion data than someone who 
was not acquainted with the school. 

Purpose of Study 

The immediate objective of this study was to determine the 
validity of the General Mechanical Aptitudes Test for the pre- 


1 This study was carried out while the writer wns on the staff of the Personnel Re¬ 
search Section of The Adjutant General's Office. The opinions expressed in this article 
ate the author’s and do not necessarily reflect the official attitude of the Department 
of the Army. 

This article reports only part of this study. The validation study was carried out 
on six groups of students. This article reports the results obtained on the nth-year 
Technical major group, The rest of the study appears in the Journal oj Psychology, 
XXIX (1950), 113-155, 

The writer makes grateful acknowledgement to Dr. E. E, Cureton and Dr. Erwin 
K. Taylor for their encouragement in carrying out this study; also to Dr. Lawrence 
AshleyMr. William Carey, and Mr. Patrick McHugh, who permitted this study to 
be carried out in the Saunders Trades School, Yonkers, N. Y. 


254 



SELECTION OF CIVILIAN EMPLOYEES 


*55 

diction of success in industrial and technical high schools. 
The ultimate aim was to determine its validity for the selection 
of civilian employees for various mechanical jobs in War De¬ 
partment Installations. Since most of the graduates of the 
Saunders Trades School go into their respective trades and 
specialties upon graduation and are fairly successful in their 
work, it was hoped that the validation of the General Mechan¬ 
ical Aptitudes c Test for the prediction of success in this school 
would also give some indication of its validity for the selection 
of employees for various mechanical jobs which are similar to 
those for which the students were being trained in this par¬ 
ticular school. 


The School 

The Saunders Trades School is an industrial and technical 
senior high school for boys, supported mostly by Federal, 
State, and local funds, but partly by private funds derived 
from the so-called Saunders Fund. This school serves the entire 
city of Yonkers, N. Y.; its normal enrollment is over 1,000 
students. At the time of this study, however, the enrollment 
was considerably under this figure because of war conditions. 

The Saunders Trades School offers two majors, industrial 
and technical. The industrial major has a duration of three 
years, the students being admitted after they have completed 
the ninth grade in one of the several junior high schools in 
Yonkers. This major is more or less terminal in nature in that 
most of the boys are expected to go to work in their respective 
trades upon graduation, The technical major also has a dura¬ 
tion of three years. Its graduates are expected either to become 
junior engineers in industry or to go to engineering colleges for 
further study. The series of courses in this major are so ar¬ 
ranged that the students can meet college entrance require¬ 
ments upon graduation, 

The industrial major consists of seven curricula: Auto Me¬ 
chanics, Building Maintenance, Carpentry, Electric Installa¬ 
tion, Machine Shop, Plumbing, and Refrigeration. A student 
entering the industrial major, selects one of these curricula. 
All seven curricula are parallel in nature, but in each one the 
student pursues course and shop work along his particular 



2$6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


specialty. The student devotes about half of his time to the 
shop and laboratory courses and the other half to related 
courses in mathematics, science and drafting. The work in 
these courses is fairly well integrated with the work in the 
shops; that is, the theory and the mathematics involved in a 
particular shop instruction unit are first discussed in the re¬ 
lated courses before the shop work is begun. For example, the 
student is taught the theory and mathematics of parallel cir¬ 
cuits in electricity before he performs the experiment in that 
project in the shop. The degree of integration between the 
related courses and the shop projects varies among the seven 
curricula, depending largely on the cooperation of the instruc¬ 
tors. 

The technical major consists of five curricula: Architecture, 
Industrial Chemistry, Electricity, Machine Design, and Power 
Generation. Each student pursues one of these for three years. 
As in the industrial major, the five curricula are parallel in 
nature, but, at the same time, specialized. Each curriculum 
consists of shop or laboratory work, related courses, and cer¬ 
tain academic courses such as English and American History. 
The related courses are integrated with the shop and labora¬ 
tory work. 

The Population 

This study was carried out on one of the three groups of 
students in the technical major, namely, the nth-year group. 
There were seventy-two students in this group distributed 
among the five curricula as follows: 


Number of 

Curricula Students 

Architectural Course. io 

Industrial Chem. Course. 8 

Electrical Course. 18 

Machine Design Course. 22 

Power Generation Course. 14 


Total 


72 


Description oj the Test 

The General Mechanical Aptitudes Test was designed to meas¬ 
ure various aspects of mechanical aptitude. It consists of four 
sub tests as follows: 









SELECTION OF CIVILIAN EMPLOYEES 


257 

I. Mechanical Comprehension.—This consists of 43 three 
alternative multiple-choice items administered with a 15-min¬ 
ute time limit. The items of this test were adapted, by permis¬ 
sion of the author, from Forms AA, BB, and WI of the Bennett 
‘Best of Mechanical Comprehension. It measures general me¬ 
chanical insight and the capacity of an individual to under¬ 
stand mechanical operations. 

1. Technical Reading.—This is a paragraph-and-question 
test based on selections from technical manuals and texts. 
It consists of 2.9 items administered with a 15-minute time 
limit, The directions and a sample question are shown below: 

DIRECTIONS 

This test consists of five paragraphs and some questions 
about each paragraph. There are 29 questions in all. Read 
each paragraph and then answer the questions which follow. 
Read the paragraph as many times as you need to in order 
to answer the questions. The first paragraph and the ques¬ 
tions based on it is a sample to show you what to do. 

The blast furnace is a great stone chimney 100 feet high 
or more. It is filled with a roaring fire from top to bottom. 

Into the top of the blast furnace are dumped carefully 
measured amounts of iron ore, coke, and limestone. After 
4 or 5 hours of terrific heat, molten iron is drained off 
from a door at the bottom of the furnace. 

46. A blast furnace is made of 

A. iron 

B. stone 

C. clay 

D. coke 

3. Paper Form Board.—This consists of 44 items, adminis¬ 
tered with a 10-minute time limit, and measures the ability 
to manipulate spatial images mentally. The directions for the 
test and a sample question follow. 

DIRECTIONS 

This test consists of 45 problems. At the top of each page, 
there are four large figures labeled A, B, C, and D, like those 
shown in the first row below. Each problem shows one of 
these figures cut into pieces and scattered around in a box. 
Look at the pieces in each box, and decide which one of the 
figures could be made if all the pieces in that box were fitted 
together. Some of the pieces may need to be turned around 
or turned over to make them fit. The pieces in each problem 
will make only one of the figures. The first problem is a sample. 



258 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 




4. Shop Arithmetic.—This test consists of 20 free-answer 
arithmetic reasoning problems based on shop arithmetic. Six¬ 
teen of the items contain diagrams, tables, or drawings, and 
the test is administered with a ao-minute time limit. A sample 
problem is shown below: 



When the larger pulley wheel makes 100 turns per minute, 
how many turns per minute does the smaller wheel make? 


Procedure 

Before the General Mechanical Aptitudes ‘Test was given» 
permission was obtained from the Yonkers school authorities 
to transcribe the school grades for use as criteria. These grades 
were copied from the school’s progress sheets which list the 
grades according to the curriculum and the year. In May, 
1945, the test was administered to 480 students of the Saunders 
Trades School by the teachers after a training session had been 
given by a staff member of the Personnel Research Section. 
All tests were scored at the headquarters of the Personnel 
Research Section,. 



SELECTION OF CIVILIAN EMPLOYEES 


2$ 9 


Analysis and Results 

In order to see whether the seventy-two students used in 
this study constituted a fairly homogeneous group, the analy¬ 
sis of variance technique was used to investigate the question 
whether the four subtests of the General Mechanical Aptitudes 
Test differentiated significantly among the five curricula of 
the technical major. The results, which are not reported here, 
showed no significant F-ratios at the i per cent level among the 
curricula within the technical major. Therefore, the irth-year 
students from the five technical major curricula were com¬ 
bined into one group and the analysis carried out on this 
group. 

I. The Criterion .—In a technical high school, such as the 
Saunders Trades School, each technical subject has a definite 
place in the total pattern of instruction offered; that is, the 
class-room subjects such as mathematics, science, and theory, 
are definitely related to the shop or laboratory work. These 
class-room subjects provide the student with the basic knowl¬ 
edge and fundamental skill which will enable him to pursue 
his shop studies more intelligently. For example, in the Applied 
Mathematics course the students in the Electrical curriculum 
learn the basic mathematics connected with the series and 
parallel circuits; in the Basic Theory course they study the 
fundamentals of the series and the parallel circuits. With this 
background, the student can learn the shop work more easily 
and more intelligently. Moreover, the work in the classroom 
is fairly well integrated with the work in the shop; that is, the 
theory and the mathematics involved in a particular shop 
instruction unit are first discussed in the related courses before 
the shop work is begun. 

Because of this high integration of related subjects with the 
shop work, and because the technical subjects do represent 
the core of the curriculum of the Saunders Trades School, 
it occurred to the writer that a composite of the grades in 
these technical subjects would constitute a more valid criterion 
than a composite of all of the grades, including the more aca¬ 
demic subjects such as English, Economics, History, etc. Be¬ 
cause of these considerations, the composite of the grades re¬ 
ceived by the students during their ioth and nth years in 



a6o EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

the technical subjects was taken as the criterion in the study. 
This composite was obtained by the summation of 12 grades 
in five different subjects, namely, Basic Theory, Shop, Physics, 
Applied Mathematics, and Plane Geometry. 

Estimate of the reliability of the criterion was obtained by 
correlating the sum of scores for the first terms with the sum 
of scores for the second terms. This reliability coefficient was 
found to be .88. When this was stepped up by the Spearman- 
Brown formula, it became .94. 

TABLE 1 

Intercorrelations Among Tests and the Criterion 

(N = 72 ) 

Paper 

Mech. Tech. Form Shop 

Comp. Reading Boaid Aritn. Criterion 


Mean.19.556 17.667 33.971 13.653 919.940 

S.D . 7.708 j.664 5.475 2.800 83.650 

Mech. Comp. .551 .476 .394 .493 

Tech. Reading. .217* .563 .542 

Paper Form Board. .262* .417 

Shop Arithmetic. . 454 


* Not significant at the one per cent level. 

2, Reliabilities of the Tests. —No estimates of reliabilities of 
the four tests were made in this study. Such estimates, how¬ 
ever, were made previously by the Personnel Research Section, 
and were found to be quite satisfactory. 

3. Inter correlations. —The intercorrelations among the tests 
and the criterion are shown in Table 1. All of the correlations, 
except two, were found to be significantly different from zero 
at the one per cent level. 

y. Multiple Correlations. —The multiple correlation was com¬ 
puted by the usual Doolittle method and also by the Wherry- 
Doolittle method in order to show the amount of shrinkage in 
the R. By the Doolittle method, the multiple R for the entire 
battery was found to be .644 and .624 by the Wherry-Doolittle 
method. Thus, the shrinkage in the multiple R was fairly 
small, namely, .02. 

When the shrunken multiple R was corrected for the attenu¬ 
ation in the criterion, it became .643 or R * 2 3 = .413. Thus the 
battery accounts for about 41 per cent of the variance of the 
criterion. 











SELECTION OF CIVILIAN EMPLOYEES 26 l 

In order to show the contribution of each test toward the 
efficiency of the battery, Table a is presented: 

TABLE i 

Contribution! of the Tests Toward Battery Efficiency 





Corrected* 

%of 

Criterion 

Test Battery 

R» 

R 

R 

Variance 

B 

.2938 

■ 543 

• 559 

31,2 

B, C 

B, C,D 

■3790 

.616 

■ 635 

4°-3 

.3870 

.622 

. 641 

41.1 

B, C, D, A 

•3891 

.624 

■ 643 

4 r -3 


A = Mechanical Comprehension B = Technical Reading 

C = Paper Form Board D = Shop Arithmetic 

* Corrected for the attenuation in the criterion, the reliability coefficient being .94. 


5. Beta Weights. —The beta weights were also found by the 
Doolittle as well as the Wherry-Doolittle methods. Their values 
are shown below: 


Test Beta Weights 

Technical Reading.328 

Paper Form Board.240 

Shop Arithmetic.153 

Mech. Comprehension.137 


6 . Regression Equation. —The regression equation for pre¬ 
dicting the criterion from the four sub tests of the General 
Mechanical Aptitudes Test may be written in standard form as 
follows: 

z 0 = .328 z B + .240 Zo + .153 Z D + .137 Z A 

In order to get the equation in score form, the / 3 ’s were trans¬ 
formed into the corresponding b’s. The resultant equation, 
expressed in terms of deviations from the mean, is as follows: 

x 0 = 4.844 x B + 3.667 Xc + 4.571 x D + 1.487 Xa 
with a standard error of estimate of 64.08. 

Conclusions 

I. In general, the General Mechanical Aptitudes Test shows 
fairly high validity for the prediction of academic success in 
the basic technical courses in an industrial or technical high 
school. The multiple correlation, when corrected for attenua¬ 
tion, was found to be .643. Thus, the General Mechanical Ap¬ 
titudes Test battery accounts for about 41 per cent of the 
variance of the criterion. 









262 educational and psychological measurement 

When the Mechanical Comprehension Subtest is removed 
from the battery, the multiple R, corrected for attenuation, 
is ,641 and the resultant three-test battery still accounts for 
about 41 per cent of the variance of the criterion. Thus, for a 
forty-five minute testing time, one can get a fairly good indi¬ 
cation of the probable success of a student in a technical high 
school such as Saunders, 

In the larger study, the efficiency of the battery for the pre¬ 
diction of success in specific subjects such as mathematics, 
science, shop, and theory, was studied. None of the multiple 
R’s in that study exceeded the multiple R found here. Only 
one of them, the one predicting success in Physics, equaled the 
multiple R found in this study. 

2. From the Beta Weights one can conclude that the Tech¬ 
nical Reading Test contributes most heavily to the prediction 
efficiency of the battery. There is so much technical reading 
required in the science, mathematics, theory, and shop courses 
of the technical high school that skill and speed in doing such 
reading contributes heavily towards academic success in such a 
school. 

3. Saunders Trades School is generally similar in students, 
curriculum, instructors, support, etc., to other industrial and 
technical high schools in New York State. Therefore, one can 
conclude that the General Mechanical Aptitudes Test is a valid 
test for the prediction of success in an industrial and technical 
high school in the state of New York. Since many states follow 
the N. Y. State pattern of vocational education, one could 
probably conclude that this test has similar validity for any 
such industrial or technical high school. 

4. Finally, since most of the graduates of the Saunders 
Trades School go into their respective trades and specialties 
upon graduation and are fairly successful in their work, the 
writer would conclude that this test should be fairly valid for 
the selection of employees for the various mechanical jobs for 
which training is given in this school. In other words, from this 
study one can infer that the General Mechanical Aptitudes 
Test should be valid for the selection of machinists, machine 
designers, electricians, power plant operators and technicians, 
junior industrial chemists, and junior architects. 



THREE AIDS IN THE EVALUATION OF THE 
SIGNIFICANCE OF THE DIFFERENCE 
BETWEEN PERCENTAGES 

C. H. LAWSHE and P. C, BAKER 
Purdue University 

Those who construct and use paper-and-pencil tests are 
confronted with the task of making a lengthy item analysis. 
Many authors have offered various devices to aid in this work. 
All of these serve the purpose for which they were intended; 
however, there are within and among them several weaknesses, 
viz: 

1. They may not be truly time-saving. A considerable 
amount of additional computation may be required. 

2. They may give only gross approximations to the desired 
values. 

3. They may give results in terms of a statistic whose 
sampling error distribution is unwieldy. 

We here offer three instruments to be used in the evaluation 
of the significance of the difference between two percentages: 
Table 1, "The Significance of the Difference Between Per¬ 
centages”; Table 2, "The Omega Equivalent to a Percentage”; 
Figure I, a nomograph to estimate the significance of the dif¬ 
ference between percentages. 

These instruments were devised with certain criteria in mind: 

1. The amount of calculation required shall be minimal. 

2. Restrictions placed upon the data shall be minimal. 

3. The results obtained shall be accurate to a degree commen¬ 
surate with published results. 

4. The results obtained shall be subject to a “standardized” 
interpretation. There shall be no ambiguity. 

5. The instruments shall be compact; easy to use. 

liable /,—'Table 1 is the result of a direct approach to the 
usual formula for the critical ratio of the difference between 
two percentages to the standard error of that difference when 

*63 



TABLE i 

The Significance of the Difference between Percentages 



- 4 -vQ T — d VC) 

DO rj- 

j— h co r~^oo 
Vo rt H O 

r> h 

-rj- 5i o\ 

vo <-J>vo >ovo r-- 

'o h O 

oo rl r—- o d VC 
n d <o vo i—i *-o vo 

ro CT H- vo 

h.*^Ti-nci o 

d ''f cl ^ t-H t-( 
vo ^ <m civo On On c -3 

H ‘f* »-« •-< d -4- t~-- 

00 vO -vo Tf- ro d ►-« Q 

Vo Vo crvvo d 00 t —1 vo 
-rj-vp. d CTN M rn H[ Vo (—I 

O ^ O 00 O O Cl - 4 - r— 

On r^vo ^=H co c—) d >-< O 


s 


l H o vo vo O Ov »-i oa O 

| ’- / "'»VO CTiDO CO -*t- t~>- Ct 1-1 

) d OO vo VO r^- On *-i t-~^ 

) co vo KT, Tj. n Cl M CH O 


vo ’Ovo d co Onvo d 00 Kf o 

vo o -cj- noc -t« •-* 

o Li i— vo - 4 - vovo Oh. ch > 4 - t— 
»-i On {—-vo vo co d d +-c O 


00 00 
d O 00 


- 4 - co Vo c-j d vo |>oo 00 vo 
O d d vor~~vooo -*J- d >-< 

r-- ■d- co n -- 4 -vo 00 ^-4 r- 

—- 3 Vo - 4 - C-) d d (-( O 


DO d M d On On d r^-vo d 1 —• •—• ■-* 

d j-vo C 7 \d a Ov^voO t • VO CO 

vo f—- rn d d d -^vo On >-* >+■ t—* 

r* O\oo r—vo vo -4- co d d *-* O 


i't-'O r-» d d d vo CTHOO t-H vo 

f"- O v-,vq d c-) Qi r- on h ca vo 
d ^ -g\ d *-* »-* d vq qv d — 


vo d o ctnco r^vo vo - 


1 CO d d h* o 


Q on o o Vo C"- ON CO ON O d -d- d co 

'"4“ c—) ON nvo co o vo -^t-oo c-j On Vo On 

1 'cr ova c-^ 1—1 H-c d n w-i t—• o d vo t~*» 

. d □ O>oo C^vo vitf-cOciiH *-< O 


§ c^ o O On vo co vovo d o '*4-00 

do c-n d ci d Co co t-t vo •—< -4- -cf- 

MMD CO d d c-o -rf-vo On "* 4 -VO oa 

d O On 00 C—VO Vv 4 *n ncl i-t o 


vo ^ d d no 1 — d co d vo On t"- d d n 

o d OO vo d On NO On d n CA On d Vooo 00 c-3 

OO rH O CO i—< vo -tH C-n -4- Vo VO 00 *-H CO vo (— On 

CO Onvo co d o ON CO r-vo v/-i Tj- ^4- C-) d »—1 o 


« y 


COVO d 

d t-oco _ „ . _ _ „ _ 

_ .00 O O c^j on r—• t~^oo O d 

d co vcj -* 4 * d O On co jv.no no l n 


On -rh d -^t-vo O vo d O- *-< f~~ O d 

r n vovo o ^nf-No d d cn vo ~« 4 - o 

ctnoo o O c^j on J^- r—• 00 o d ^vo oo o o 

— ■ — — - • — * n «d d i-, 


n d «—< 1 — 1 «—i 1—1 1 — 1 

ON Q d 4" V) O 
CO Odd VO -si- _ 
Vo d 0\ H - 4 - ^CC 

n ^ ci cavo •* 3 'd 


on d co vnh vooo 1—1 r-vo c-3 
O vn O v^o jH d d d On 




n ‘o r> Om-1 d co 

On 00 t~-vo vo -rf- ’- 4 - co d 


N^-ddi—it—ri—indt-1 

On O vo n ’-m vo do 00 vo t 

OO OOQdr^d Tl- Vo ( 
vo Q co Q c^jdvod o 1 
O d 3 o d- Vo c-^ d h~t 1 


Vo vooo O'- t)- Q m 
M-VO n -d" J^- Q O 

o ^ fn^ilvO d 
On 00 C^-VO vo vo ^ 4 - 


■ co ddi—« •—11 1—« *—« i—» »—j 


t-. d d C -3 r-j ~t 4 - vo VoV 


■a; 


264 


3333 -!J 4 8 

2294 
















SIGNIFICANCE OF DIFFERENCE IN PERCENTAGES 265 


the size of the samples upon which the two percentages are 
based are equal. (It is applicable only when Ni = N s .) 


t = 


Pi ~ Pa 


V 


Mi 

N 


+ 


P2CL2 

N 


- - ~~~ _ 

VN V p 1 q l + p 2 q 2 

Let the right-hand member of equation i equal Theta. Theta 
could be evaluated for all combinations of pi and p 2 , but this 


Pi ~ P2 


(l) 




TABLE 2 


The Omega Equivalent to a Percentage 
(Omega positive for p > ,50, negative for p X -S°) 


p 

£2 

P 

P 

0 

P 

1,00 

1.1106 

.OO 

■ 75 

.3702 

■15 

■99 

.9690 

.01 

■ 74 

•3541 

.26 

.98 

.9099 

.02 

■73 

•3379 

■17 

■97 

. 8648 

■03 

.72 

.3221 

.28 

.96 

.8259 

.04 

■71 

.3064 

.29 

■95 

■7917 

•°s 

.70 

.2910 

■30 

■94 

.7602 

,06 

.69 

.2756 

■31 

■93 

.7320 

.07 

.68 

.2612 

• 31 

.92 

.7052 

.08 

■67 

■2453 

•33 

.91 

.6797 

•°9 

.66 

■ 2303 

-34 

.90 

■ 6565 

. 10 

.65 

■ 1155 

•35 

.89 

.6326 

. II 

.64 

.2006 

.36 

.88 

.6104 

, 12 

•63 

.1861 

-37 

.87 

■ 5891 

■ 13 

.62 

■ i7 r 4 

-38 

.86 

.5697 

-14 

.61 

. 1568 

-39 

■ 8 5 

■ 5471 

•15 

.60 

.1424 

.40 

.84 

-5287 

. l6 

■59 

.1280 

■ 4i 

■83 

.5096 

■17 

•58 

-1137 

.42 

.82 

.4911 

.18 

■57 

.0994 

-43 

.81 

.4728 

-19 

■ 56 

.0851 

-44 

.80 

•4550 

. 20 

■55 

.0708 

■45 

■79 

■4375 

. 21 

■54 

■0567 

.46 

.78 

.4201 

.22 

•53 

.0424 

■47 

■77 

■4033 

■ 23 

■51 

.0283 

.48 

.76 

■3867 

.24 

■ 51 

.0141 

■49 

■ 75 

•3702 

.25 

• 5° 

.OOOO 

• 50 


would result in a table much too large for practical use; hence, 
we have limited ourselves in Table i to combinations of pi 
and p 2 which are multiples of five. 

To use Table I it is necessary only to multiply the tabled 
value of Theta by the square root of N to find the critical 
ratio. 

t = By/ N ( 3 ) 

Table i is useful in classifying a large number of differences 
into three categories; (i) definitely significant, (2) doubtful, 





2.66 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(3) definitely not significant. Those differences falling in the 
doubtful category may then be more carefully evaluated by 
means of equation 1. 

T>able 2 .—Further consideration of the inaccuracies inherent 
in Table I due to the-skewness of the sampling distribution of 
p when the true value of p approaches 100% or 00% led to the 
development of a statistic which is a function of p and which 
has a constant standard error dependent only on the size of 
the sample. Readers familiar with Fisher’s arctanh transforma¬ 
tion of r will recognize the utility of such a statistic. Kelly 
(p, pp. 593-594) offers the development of such a statistic; 
our development differs only in minor details. 


£2 = f(p) 
d£2 = f'(p)dp 

*1 = [f'(p)]V 
I 

N 


2 3 
V 


s 

CTO 


vn ~ f ' (p) l/1? 


(Omega is a function of p) 
(First derivative) 

(Take variance of both sides) 

(Let variance error be inversely 
proportional to N) 


(Integrate) 


- ''<F> 

2 /v7=p d ? - 

f(p) = 2 arcsin Vp + C 

£2 = 2 arcsin Vp + C (The desired function) 

To test the significance of the difference between two per¬ 
centages we need only to transform them to Omegas and apply 
the critical ratio formula: 


t = 


difference 


S. E. of difference 
_ (1 arcsin-\/pii 4 - C) — (2 arcsin-\/p2 + C) 

aN ~ lNa _ 1"\/a (arcsin Vpi — y) 
y N t + N s L V 4 / 

- ^arcsin Vp2 — 


( 4 ) 



SIGNIFICANCE OF DIFFERENCE IN PERCENTAGES 267 

The reason for the algebraic manipulation, factoring out the 
square-root of two in the numerator, will be clear from the 



Fio. I. 

following. The constant of integration is evaluated as s/i 

in order to make the function symmetrical about 50%. 

If Ni equals N 2 formula 4 reduces to 

t = \/N (Qi — fii) 





a68 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Omega is now defined as 

fl = yfi ^arcsin y/p — ^ 

Table % contains values of Omega for all values of p. These 
values are positive for p greater than 50% and negative for p 
less than 50%. 

For simplicity in notation we here define Omega lower case 
as the difference between two Omegas 

co — 1^2 


TABLE 3 

The /%, 5%, TO% Confidence Values oj Theta and Omega 


Confidence 

level 


Use with Table 1 
(N’s are equal) 


Use with Table 2 and with Figure \ 
(N's are equal) (N’s are unequal) 


1% 

2 .S 7 i 8 

0oi 

2.5758 

= 0) M 

2.5758 

— 0)01 


Vn 

Vn 

/ 2N,N 2 

V Ni + N 2 

5 % 

1.960a 

0QS 

1.9600 

= Wot 

1.9600 

= O) 0t 


\/N = 

Vn 

/ ^N,N 2 






V Nx + N 2 


10% 

1.^449 

a 

1.6449 

= 0)10 

1.6449 

= 0)10 


Vn 

a 10 

Vn 

. / 2N,N 2 


+ N* 


To use Table 1 find the Omega equivalents of the two per¬ 
centages, find the algebraic difference between the two Omegas, 

multiply this difference by a / ^_ jf the two samples 

' Ni -f- N2 

differ in size, or multiply the difference by VN if the samples 
are of equal size. This yields a critical ratio which can be 
evaluated in terms of the normal probability function, or Stu¬ 
dent’s distribution of t if N N 2 is less than 30, in which case 
the degree of freedom associated with t is equal to Ni + N2 — 2,. 


t 

t 


-»a/M 

r_Niff 

= <u\/N 


Nj_ 
+ N 2 


Figure /.—Figure I is a graphic representation of Table 2. 
To use this nomograph, find pi and p 2 , and join these points 






SIGNIFICANCE OF DIFFERENCE IN PERCENTAGES 269 

by a straightedge. Where the straightedge crosses the center 
scale find omega (co). This value is identical with that found 
from Tabled, i.e., w = fij. — 0 2 , and is used in the same manner. 
This nomograph 1 is of use when a large number of differences 
are to be evaluated and classified as significant, doubtful, and 
not significant. 

A Shortcut .—When a great many differences are to be evalu¬ 
ated, as in an item-analysis study, the following shortcut is 
suggested. Instead of multiplying each Theta or Omega value 
by the square-root of N, find the Theta value or Omega value 
corresponding to critical ratio values significant at various 
desired levels of confidence by performing the operations sug¬ 
gested in Table 3, “The 1%, 5%, and 10% confidence values 
of Theta and Omega.” 

Figure I can be converted into a “tailor-made” nomograph 
for one particular study by marking the i%, 5% and 10% 
confidence values of Omega on the center scale, and by writing 
frequency values corresponding to proportions of Ni and N 2 
along the pt and p 2 scales. 

Summary 

Three instruments, two tables and a nomograph, to be used 
in the evaluation of the significance of the difference between 
two percentages have been offered. The research worker with 
a large number of differences to evaluate can, with three simple 
calculations, determine which of his items attain the 1%, 5%, 
or 10% levels of significance. 

REFERENCES 

1. Burr, I. W. and Hobson, R. L. “Significance of Differences in 

Proportions with Constant Sample Frequencies in Each 

Pair.” Journal of Educational Research, XXXIV (1943), 

307-312. 

2. Davis, F. B. Item Analysis Data. Harvard Education Papers, 

No. 2. Cambridge, Mass.: Harvard Univ. Press, 1946. 

3. Dunlap, J. W. and Kurtz, A. K. Handbook of Statistical Nomo¬ 

graphs, Tables , and Formulas. Yonkers-on-the-Hudson: 

World Book Company, 1932, 

4. Dunlap, J. W. “Note on Computation of Bi-Serial Correlations.” 

Psychometrika, 1(1936), 51-58. 

1 The authors have copies of the nomograph (S} r x u") which they will supply 
gratis in response to single copy requests. Address the senior author. 



2^0 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

j, Dunlap,}. W. "Nomograph for Computing Bi-Serial Correla¬ 
tions,” Psychometrika , 1(1936), 59-60. 

6. Edgerton, EL A. and Paterson, D, G. "Table of Standard Errors 

and Probable Errors of Percentages for Varying Numbers 
of Cases.” Journal of Applied Psychology, X(i926), 378-391, 

7. Guilford, J. P. "The Phi Coefficient and Chi Square as Indices 

of Item Validity.” Psychometrika , VU1941), 11-19, 

8 . Jurgensen, C. E. "Tables for Determining Phi Coefficient.” Psy¬ 

chometrika, XII(i947), 17-29. 

9. Kelly, T. L. Fundamentals of Statistics. Cambridge, Mass: Har¬ 

vard Univ. Press, 1947. 

10. Lawshe, C. H., Jr. "A Nomograph for Estimating the Validity 

of Test Items.” Journal of Applied Psychology , XXVI^j), 
846-849. 

11. Lichte, W. H, “A Method and Tables for Obtaining Standard 

Errors of Differences Between Proportions When N is Equal 
to N. Journal of Applied Psychology , XXXI (1947), 449-456. 

12. Long, J, A. and Sandiford, P. The Validation of Test Items. De¬ 

partment of Educational Research, No. 3, Toronto: Uni¬ 
versity of Toronto, 1935, 

13. Mosier, C, I, and McQuitty, J. V. “Methods of Item Validation 

and Abacs for Item Test Correlations and Critical Ratio 
of Upper-Lower Difference.” Psychometrika , V(i94o), 57- 

^ .. 



A STUDY OF FAKING ON THE ICUDER PREFERENCE 

RECORD 1 

ORRIN H. CROSS 
University of Alabama 

Since vocational guidance counselors and employment offices 
of industrial concerns use it so frequently, the author of the 
present paper felt that the possibility of faking the Kuder 
Preference Record needed investigation. Examination of the 
items of this inventory reveal many which apparently would 
be quite transparent even to the average individual. This pos¬ 
sibility throws some doubt on the advisability of using it except 
when wholehearted cooperation of the subject is assured. 

A recent paper by Longstaff (i) on a similar problem has 
indicated that both the Kuder and the Strong Vocational In¬ 
terest Blank for Men are susceptible to multiple faking, i. e., 
faking upward on some of the scales and downward on the 
remaining ones. The present study differs from Longstaff’s 
in several ways. In the first place, LongstafFs subjects were 
mature students in an evening Extension Division class in 
Vocational Development and Personnel Psychology at the Uni¬ 
versity of Minnesota, presumably somewhat sophisticated in 
psychological test taking; the subjects for this study were drawn 
from a high-school group, probably quite unsophisticated psy¬ 
chologically. Secondly, Longstaff had his subjects attempt mul¬ 
tiple faking, upward on the mechanical, scientific, artistic, 
literary, and musical scales of the Kuder, downward on the 
remainder; the subjects for the present study were instructed to 
fake either up or down on just one scale at a time. Finally, 
check studies were made in the present case to determine 
whether previous acquaintance with the test might have been 


1 This research was supported in full by a grant from the Research Committee of 
the University of Alabama. Papers based on. part of these data were presented at the 
1949 meetings of the APA and the Southern Society for Philosophy and Psychology, 



OTj 1 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

a factor in success and also whether differences in age and 
education might have been a factor. 

"Procedure 

The construction and standardization of the test is reported 
elsewhere (i) and consequently will not be reviewed here. 

Two short preliminary studies were done in a small southern 
high school, both of them on one scale (the Mechanical), with 
one sex (male). In the first study, all seventh- and eight- 
semester students who could find time to take the inventory 
were tested. The highest ten males on the Mechanical scale 
were then asked to re-take the test with the instructions to 
fake a low interest in the mechanical field of work. Several 
of the lowest ten also re-took it with instructions to fake high 
interest. Both groups were successful. 

In the second preliminary study, the procedure was varied 
in that the students were asked to fake a high mechanical 
interest prior to any acquaintanceship with the test. The dif¬ 
ference obtained between this group of 36 boys and that of the 
Kuder norm group of high-school boys proved to be significant 
at the .01 level (actually there were fewer than 6 chances in 
100,000 that such a difference could occur by chance). 

The study being reported here used the method of the first 
preliminary study, 2 i. e., (1) honest test administered by school 
authorities, (2.) selection of high and low scoring individuals 
on each scale, (3) retest with instructions to “fake” an interest 
in the opposite direction, 3 The subjects had not been informed 
of the results of the honest test. 

2 Tn order to secure the cooperation of the high-school authorities, this procedure 
had to be followed. 

3 A copy of the directions to fake follow: 

Directions 

The inventory you did previously was done as a student, a nan-employed student’ 
honestly giving a picture of his own interests. We would now like your cooperation 
in doing this inventory as a person who was looking for a special kind of job might; 
a person who might want to “fool" the test. We want you to help us find out if this 
can be done. 

Directions for Faking Low 

You are now to pretend that your doctor has warned you that to take-* work 

would mean almost certain death. If you show high interest in that kind of work you 
will be forced to take it in spite of this fact. 

You must ‘‘slant’’ the test results so that you will not have to take this type of work; 
so that you show as little interest in-"work as you can. Thus you will score the one 



FAKING ON THE PREFERENCE RECORD 2JJ 

About six hundred (600) high-school students in four of the 
five high schools of a large southern city took the honest test. 
From this group were selected 364 students for retesting—181 
males and 183 females. It seemed desirable to compare within 
the sexes because norms on the sexes differ. 

A check study with college students from beginning psy¬ 
chology classes was made, using the method of the second 
preliminary study, i. e., fake test without previous experience 
with the inventory. Means and standard deviations were com¬ 
puted for a comparable college group from the same campus 
for an honest taking of the inventory. Comparisons were then 
made between the two college groups. This group was not 
asked to fake low because it appeared to the author that faking 
low would not be a significant problem in the situations in 
which the results of the research would be useful. In the guid¬ 
ance and industrial situations the peaks of a profile are re¬ 
garded as of positive significance, while the low scores are 
commonly used for their negative value, if at all. 

Finally, a group of 67 college students who had taken the 
inventory were asked to rank the interest fields in order of 
what they thought their test profiles would show. In this part 
of the study, a list of the scales was presented, each one fol¬ 
lowed by a list of from four to fifteen of the occupations listed 
by Kuder in his Manual as being representative of the occupa¬ 
tions in that field of major interest. 

Results and Discussion 

It will be noted from Table I that the high-school students 
were quite successful at the assigned task, the probabilities 

item of each trio which appears to you to be least indicative of-* interest in column 

headed “i”, and the one most indicative of-* interest in column “3”, make no mark 

opposite the other activity. 

Directions for Faking High 

You are to pretend you very much want a particular job. If you show a large 
amount of-* interest on this test you have it “cinched". The job does not necessar¬ 
ily involve-* work, but you must show very high interest in such things. 

You must “slant” the test results so that you will appear to have a great deal of 

interest in-* work. Thus you will score the one item in each trio which appears 

to you to be most indicative of-* interest in the column headed “1”; and the one 

least indicative of-* interest in the column headed “3"; make no mark opposite 

the other activity. 

* The name of the scale being faked was inserted in each of the blanks. At the 
end of the directions was appended a list of the occupations chosen from Kuder’s 
lists for the scale being faked. 




274 educational and psychological measurement 




ooqAooooo 

vvvqyvvvv 


O vo oo n d Osso O 
h oo\o ci so Os ''hoc ci 


OS Os Osoo o oo o CO M 


ooooooooo 

vvvvvvvvv 


tJ-oo co Os 

r- ci oo so os r» 

SO m Os SO »-ri cH O' h SO 


Os O oo O Q\ O' O Os ~*-n 


o o o o o o q o o 
VVVVVVVVV 

no Qsoo >-* 

■^-C^CNCOSO -H-Oso^ 

d^vsvi o r^* to r> 4 


vvvvvvvvv 


t^OOt-'MO'-'OCS 


sa-§ : u : 

a gs^-s 

1 | S 8-i 

w « 


















FAKING ON THE PREFERENCE RECORD 


275 


that the observed differences were due to chance being less 
than .01, with the exception of the females faking low on the 
Persuasive scale. On this scale, the probabilities lay between 
.02 and .01. If the results are corrected for a deviant case 4 , 
who actually raised her score, this probability also drops to 
less than .01. Each sex was compared to the Kuder Manual 
norms as a measure of its faking ability. Neither sex failed to 
fake high successfully on any one of the scales. On the other 
hand, both sexes failed to successfully fake low on scales four 
(Persuasive) and nine (Clerical), and the females also failed on 
scale two (Computational). If these results are each corrected 
for a single deviant case whose fake score proved to be higher 
rather than lower on the retest, the probabilities drop to less 
than .01 on all the scales except the Persuasive for females. 
This probability is .075. Comparison of the sexes, scale by 
scale, failed to reveal any significant differences between them. 
The “t” values ranged between .090 and .681 for from 13 to 21 
degrees of freedom. 

College males and college females proved about equally ex¬ 
cellent at faking high when compared to the norms derived in 
this study. For the male group the “t” values ranged between 
3.93 and 25.41 (median = 12,09) f° r degrees of freedom be¬ 
tween 63 and 66 for the various scales. For the female group the 
"t” values fell between 6.49 and 26.88 (median = 10.04) f° r 
degrees of freedom between 128 and 134. Comparison of the 
sexes revealed significant differences between them on the Per¬ 
suasive scale only, with the males showing superiority there. 
The “t" value here was 6.353 for 21 degrees of freedom. 

Finally, the high-school and college groups were compared. 
No significant differences were evident here. The “t” values 
obtained fell between 2.04 (for 14 degrees of freedom) and 
.049 for from 14 to 21 degrees of freedom. 

Inspection of the pertinent data (Table 1) indicates that 
faking high is easier than faking low. The male sex faked high 
better than low on all but the Mechanical scale, the female sex 

‘The author assumed that when a subject instructed to fake low not only failed 
to lower his score on the retest, but raised it, he was not following directions either 
because he misunderstood or because he did not wish to cooperate. Critical ratios, 
except where noted, were calculated with such deviant cases included. 



2,76 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

faked high better than low on all but the Scientific, Social 
Service, and Clerical scales. Faking high is a more important 
ability in the situations in which the test is applied, conse- 
sequently this finding appears pertinent. 

Faking high appears to be somewhat easier for the college 
group than for the high-school group although the difference 
does not reach the .01 level of significance on any scale for 
either sex. Differences favoring the high-school groups occurred 
on the Computational and Artistic scales for the male sex, and 
for the Persuasive and Artistic scales for the female sex. 

The mean rank difference correlation of the college group 
which attempted to predict what its order of standing on the 
nine scales would be was +.67. This coefficient is significant at 
the .01 level. 

The uses to which such an inventory is put need examination. 
First, it is used in vocational and educational guidance in 
public schools, colleges, guidance centers, and the employment 
services; and, second, it is used in the selection of workers for a 
job in business and industry. 

What do the results reported mean, then? In the first case, 
guidance, rapport may be justifiably assumed. In this case, 
such a fakable inventory retains usefulness. However, in the 
second instance no such assurance of cooperation exists. On 
the contrary, it might reasonably be assumed that testees 
take the opposite attitude, consciously or unconsciously. On 
the basis of the reported results, it might be asserted that such 
an inventory as this must be interpreted with caution in the 
industrial situation, at least where the applicant has any ink¬ 
ling of the job he is being considered for. 

Longstaff (1) has suggested the analysis of this inventory 
after the fashion of the “K scale” of the Minnesota Multiphasic 
Personality Inventory to seek a correction for faking. That was 
the original intent of the study here reported, but analysis of 
some of the data obtained failed to reveal enough items for 
such a scale. Another possibility in selecting workers for a job 
might be the use of the whole profile, on the assumption that 
secondary (and less transparent) peaks, and low scores might 
prove to be discriminating. 



FAKING ON THE PREFERENCE RECORD 


277 


Conclusions 

The results reported above appear quite consistent in their 
implications. In only one case did a group (in this study, the 
high-school females, faking low on the Persuasive scale) fail 
to perform the task successfully as compared to its own honest 
tests. Correction of this result for a deviant case brought that 
result to significance also. It thus appears that a subject suit¬ 
ably motivated may successfully fake the Kuder Preference 
Record, 

As shown by the present study, when an applicant for a job 
has any idea of what job he is being considered for, his scores 
should be interpreted in the light of the knowledge that faking 
is possible if he desires to fake. In the properly motivated guid¬ 
ance situation, this problem does not arise. 

REFERENCES 

1. Longstaff, H. P. “Fakability of the Strong Interest Blank and the 
Kuder Preference Record.” Journal of Applied Psychology, 

XXXII (1948), 360-369. 

1. Revised Manual for the Kuder Preference Record, Chicago: Science 
Research Associates, 1946. 



PSYCHOLOGICAL TESTING FOR IMMIGRANTS IN A 
VOCATIONAL COUNSELING AGENCY 1 


BENJAMIN BALINSKY 

United Service for New Americans and City College of New York 

The Vocational Services Department of the United Service 
for New Americans aids recent immigrants to achieve voca¬ 
tional adjustment. There is no established testing program, but 
outside testing facilities have been utilized on occasion. The 
question arose about whether or not to increase the utilization 
of tests for the recent immigrants. Ordinarily, the matter would 
have been directly answered on the basis of precedent that tests 
are widely accepted by counseling agencies. However, since the 
Vocational Services Department has recent immigrant clients, 
the matter of testing them more regularly was enmeshed in the 
broader problems of test validity and interpretation. 

Design of Study 

It is known that the tests employed in counseling have been 
standardized on American-speaking and American-acculturated 
populations. The recent immigrants not only do not understand 
the American language idiom well but have had extraordinary 
personal experiences that make a testing program for them one 
that requires careful study. It was decided that the design of 
the study have two phases: 

i. To discover the particular needs of the clients and the 
counselors who serve them. 

i. To try out various psychological tests and techniques. 

The first task has been completed. The second phase has only 
begun. The first phase was accomplished by means of the 
following: 

1 This paper is adapted from a report read at the Jewish Occupational Council 
Eastern Regional Conference, February 18, 1949. The writer wishes to express his 
sincere appreciation to Mr. William Karp, Director, Vocational Services Department, 
for his invaluable aid in starting and working through the Psychological Services 
Program. 





PSYCHOLOGICAL TESTING FOR IMMIGRANTS 279 

1. Conferred with supervisors and counselors on client needs 
and their own. 

2. Observed interviews. 

3. Interviewed directly, especially the more difficult clients. 

4. Attended and participated in administrative staff meet¬ 
ings. _ 

5. Studied case records. 

6. Conferred with representatives of the Family Service De¬ 
partment because of the close working relationship with 
the Vocational Services Department. 

The second phase will be accomplished by cooperating with 
outside testing facilities where the immigrants will be examined. 
The test results will be studied against interview, case history 
and vocational data, and the test battery modified from time to 
time as the results merit. 

Immigrants as Special Problems 

The question may be raised as to whether or not the recent 
immigrants present testing problems different from the usual 
client. From phase one of the study it was learned that: 

1. The recent immigrant is generally older—42.5% were 40 
years of age or older; 30% were 45 years of age or older; 
22% were 50 years of age or older. 

2. 82% had been in this country one year or less. 

3. The largest number of applicants had been in business, 
salesmen, or office workers in Europe. 

4. 11% were handicapped by general health or a specific phy¬ 
sical impairment. 

5. 70% were on relief and over 95% were known to the Family 
Service Department at some time. 

6. Almost all the recent immigrants had more or less difficult 
social and personal adjustments to make at the same time 
they were making a vocational adjustment. 

7. Counselors wanted help with understanding the personality 
of the particular immigrants. They felt that the immigrant 
clients were more difficult to understand than the American 
clients with whom they were familiar. 

8. Counselors indicated that a routine interpretation of tests 
was of little value. 

These findings put the immigrant in a special class that may 
well be compared to the so-called handicapped groups. Just as 
one does not proceed on the same basis with the handicapped as 
with the non-handicapped, the same cautions must be exercised 
with the recent immigrant. One would not rely upon oral tests 



280 educational and psychological measurement 

for the deaf or written tests for the blind, and one must seek 
tests that are more valid for the immigrants. 

5 testing the Immigrant 

In testing the immigrant, we run into the issue of mechanical 
or dynamic testing and interpretation. Psychological tests have 
been generally accepted as part of the total process of voca¬ 
tional counseling. However, the use made of tests varies con¬ 
siderably from more or less routine mechanical interpretation 
of the results to the dynamic interpretation where predictions 
are based on all there is known about measurement principles, 
the particular test and the particular individual being tested. 
Where the individual has only a vocational problem uncompli¬ 
cated by difficult social and personal adjustments and where the 
individual has had normal opportunities for development, a 
prediction based upon the specific test results will probably be 
valid. However, where this is not so, as in the instance of recent 
immigrants, then the prediction must be based on more factors 
than the specific test results. 

Apropos of this issue I refer to the case of Hans K., as re¬ 
ported in the Jewish Occupation Council, Program and In¬ 
formation Service, Release j^CM-9. Hans was an immigrant 
about 24 years of age. Tests had been given him twice. The first 
time they pointed up the need for psychiatric referral. The second 
time tests were given the statement was made that, “his general 
pattern of abilities has not changed and he has not improved 
much in abilities where learning power is involved, such as vocab¬ 
ulary.' ’ The statement continues, “on the basis of these test re¬ 
sults it appeared that Hans could not profit from formal training. 
An occupation requiring either gross or precise manual dexterity, 
speed and some accuracy would be most suitable for him.” 

However, as a result of the psychiatrist’s statement that 
Hans might react with a neurotic or psychotic breakdown “if 
he could not anticipate his unrealistic vocational aspirations,” 
the results were reviewed. It was decided that typing and book¬ 
keeping might not be contra-indicated. Hans went on to make 
a good adjustment in office work and even successfully accom¬ 
plished some part-time college work. 

Here predictions of success were based upon specific test re- 



PSYCHOLOGICAL TESTING FOR IMMIGRANTS 281 

suits without fully taking into consideration the language bar¬ 
rier and the emotional complications. Hans had rated an IQ 
of 137 on the Performance part of the Wechsler-Bellevue Intelli¬ 
gence Scale and an IQ of hi on the Full Scale but only an IQ of 
85 on the Verbal Scale, this test having been part of the battery 
given previously. An IQ of 137 on the Performance part shows 
a very high level of intelligence and indicated that the low 
Verbal IQ is most likely not permanent but very probably 
temporarily depressed. It should also have been known that 
Hans was interested in office work. Considering all the test re¬ 
sults and what was known about Hans, the recommendation 
for an occupation requiring either gross or precise manual dex¬ 
terity seems like clutching at straws. There seemed to be in¬ 
complete evaluation of the test results, especially in terms of 
Hans’ particular background and personality. The need for a 
more dynamic interpretation was sharpened by the fact that 
Hans was an immigrant with emotional difficulties. 

It has often been remarked that the immigrant does not have 
different problems, but rather more of the same that every one 
else has. But more of the same, a quantitative difference, makes 
eventually for a qualitative change. A person may be anxious 
upon occasion, but another may be always anxious. This quan¬ 
titative difference makes for a different style of life, different 
kinds of adjustment. It calls for recognition by tests and by the 
counselor in evaluation and adjustment. Water will still be 
water at 99 degrees C. but at 100 degrees it will be steam and 
the properties will change. This qualitative change demands 
different handling. So it is with the recent immigrant. He may 
have a little more or much more anxiety, suffer more from the 
difference between his expectations and reality, have a greater 
tendency to conflict between the need to be independent and 
dependent. But because he has more, his problem is not only 
greater but different. He has less language facility, his home sit¬ 
uation is less favorable, he has had fewer opportunities to make 
adjustments on his own here and to see their results. Because of 
this, also, he reacts differently. 

Greater attention must be paid to the whole person in testing 
and evaluating him. We are making predictions on the basis of 
test results and these predictions must be based on all the evi- 



0 , 8 1 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

dence. It is necessary to take into account: (i) theories and 
principles of measurement, (o) the standardization of the tests 
themselves and (3) the particular person being tested. 

One of the major theories in measurement that is significantly 
related to the testing of the immigrant is that of the effect of the 
environment on present test abilities. There is ample evidence, 
both experimental and clinical, which demonstrates that an 
environment different from that of the group upon which the 
test was standardized will lead to results that require explana¬ 
tion. Since we are to predict the probability of adjustment to 
new situations we must include the possibility of accelerated 
growth when in a new environment, especially one that is more 
favorable for growth in the expected direction. Specifically, for 
the immigrant, this means we must be able to predict his ability 
level after a period of time. After a while when he becomes more 
accustomed to American ways, feels more secure and under¬ 
stands the language better, his actual test results may rise. 
We must be able to predict the approximate rise in the present 
test results. And this we cannot do unless we take into account 
all factors about the tests and the individual. 

It is necessary to know the validity, reliability and the norm 
populations for each test in order to interpret the results on 
the tests. Validity is most important since, if a test does not 
measure what it is supposed to, it matters not how consistently 
it measures something else or from what population the norms 
were derived. Moreover, the validity of tests is not so high as 
to measure with infinitesimal error. Most intelligence tests have 
validity coefficients of from about .80 to .90 and aptitude test 
validity coefficients approximate .60 as the modal instance. 
This means that the error of prediction may be quite large for 
any one individual. This error can be reduced by studying all of 
the test results in terms of the individual's present state and 
background. 

When it comes to aptitude tests where the validity based on 
groups is usually only about .60, the cautions in making pre¬ 
dictions for an individual must be even greater. It may be 
necessary to add more tests to get at the patterns. It is im¬ 
portant to make observations of the individual while at work 
on the tests. It is valuable to know about interests and ex- 



PSYCHOLOGICAL TESTING FOR IMMIGRANTS 283 

periences of the individual. Only then can the predictions begin 
to approach significance. 

Kinds of Tests 

From present observations of the immigrant as a test¬ 
ing problem it seems that there are sufficient tests already 
available that are adaptable for the immigrants. It is not neces¬ 
sary to make new tests. Performance tests of aptitude can be 
administered with little difficulty. The language factor is mini¬ 
mized, the cultural factor is lessened and observations of 
method and behavior can be made to illumine the test results. 
We have a little experience already with some tests. Some of 
our clients were examined at the YMCA Vocational Service 
Center. We found that the usual paper-and-pencil mechanical 
aptitude tests did not give as valuable information in filling out 
the interview data as the performance type of test and those 
which measured abilities more specifically like the Minnesota 
Paper Form-Board. The performance tests which seemed good 
were the Minnesota Spatial Relations Tests, Formboards A & B, 
the Finger and Tweezer Dexterity Tests, the Purdue Pegboard 
and the Placing and Turning Tests. The Wechsler-Bellevue In¬ 
telligence Scale can be used effectively if the Verbal Part is 
properly evaluated. 

The language factor is not as important as is the cultural. 
For instance, a direct translation of the Wechsler-Bellevue 
will still have peculiarly American items like George Washing¬ 
ton’s birthday, the height of the average American woman, 
some of the Picture Arrangement items and Picture Comple¬ 
tion items. The paper-and-pencil mechanical aptitude tests 
have many items strangely unfamiliar, not only to immigrants, 
but to many of us. These kinds of tests are contra-indicated. 

Clerical tests, like the Minnesota Clerical lest, may be ad¬ 
ministered to those immigrants who have interest in clerical 
work and are able to read and write English. The Kuder Pref¬ 
erence Record seems preferable to the Strong Vocational Interest 
Blank for our groups. 

For personality description, the projective tests, like the 
Rorschach, would be possible. The Rorschach has been success¬ 
fully used for diagnostic purposes with immigrants. There is 



284 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


some question as to its use for vocational purposes; that is, in 
terms of obtaining behavioral descriptions that would predict 
how a person would function in different work conditions. This 
use of the Rorschach deserves much more research. In fact the 
Vocational Services Department is contemplating the use of 
several projective techniques to get at the personality 
attributes and to indicate their functioning in terms of voca¬ 
tional goals. 

The present norms on the tests can be used while immigrant 
norms are being developed. The immigrant norms, however, 
will have to be validated against the degree of success in train¬ 
ing or at work. In this way scores on the immigrant norms can 
be related to the standard norms, The establishment of immi¬ 
grant norms should be used as a statistic to improve the ac¬ 
curacy of prediction. But it cannot take the place of the holistic 
or clinical evaluation and interpretation of the test results, 
Finally, the test results need to be carefully integrated with 
the subsequent interviews by counselors. 



AN INVESTIGATION OF THE PERSONALITY TRAITS 
OF ART STUDENTS 1 


MARTIN SPIAGGIA 
City College Vocational Advisement Unit 

Introduction 

Many opinions have been voiced concerning the nature of 
artistic persons. Predominant among these is the belief that 
artists are emotionally unstable. Lombroso, a nineteenth cen¬ 
tury psychiatrist, is cited by Rank (27) to have advanced a 
theory on the “insanity of genius” which treated features de¬ 
parting from the normal as “pathological.’' Psychoanalysis 
also, as Rank shows, has tended either to identify the artist 
with the neurotic—particularly in Sadger’s and Stekel’s argu¬ 
ments—or to explain the artist on the basis of inferiority feel¬ 
ings, as in Adler’s school of thought. 

Whether there is any factual basis in this “abnormal” point 
of view, or whether it has been merely a manifestation of the 
universal tendency to ascribe weakness and idiosyncracy to the 
highly gifted, has not yet been experimentally determined. 
The bulk of published psychological experimentation in this 
realm concerns itself with the relationships between artistic 
ability and such factors as intelligence (3, 33), perceptual fa¬ 
cility (15), and creative imagination (17, 18, 19). Little work 
has been done, however, in studying the personality of the 
artist. Previous studies which appear relevant to the research 
at hand are described briefly below. 

Data gathered on several hundred college students at the 
University of Minnesota (2) indicated no significant relation¬ 
ship between ability in art and introversion, submissiveness, 
or emotional instability. The Bathurst Diagnostic Temperament 
Test and the Bernreuter Personality Inventory were used; ability 

1 This study was submitted in partial fulfillment of the requirements for the decree 
of Master of Arts at New York University, For valuable help and criticism, the writer 
is indebted to Dr. Naomi Stewart, who sponsored the study. 

18s 



2.86 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

in art was measured by the Meier-Seashore Art Judgment Test 
and the McAdory Art supplemented by the judgment of 
instructors. 

In a study by Fleming of 84 girls at the Horace Mann School 
for Girls (6) rating scales were used for determining who the 
“artistic” girls were. These ratings were correlated with teach¬ 
ers’ estimates of various personality traits. The cofficiency of 
contingency between “talented in some field of art” and "per¬ 
sonality,” as rated by the teachers, was found to be .25, on 
the basis of which Fleming argues for a "definite tendency for 
those with artistic talent to possess what is commonly called 
personality.” No explanation is given of what is “commonly 
called personality.” 

Prados, using the Rorschach, found the following common 
features among ao professional artists (26): superior intelli¬ 
gence, fear of mediocrity and disregard for the routine problems 
of everyday life, strong drive for achievement and richness of 
the inner interests, and pronounced sensitiveness and emo¬ 
tional responsiveness to the outer world along with a lack of 
adaptability to it, the last mentioned feature tending to be 
counterbalanced by sound intellectual control. 

Roe, in a study of 20 prominent American male painters 
(28, 29), found them to be sensitive, non-aggressive, emotionally 
passive, hard working, self-disciplined, and of superior intel¬ 
ligence. She found nothing in the personalities or intellectual 
powers of her subjects, as measured by the Rorschach and 
Thematic Apperception tests, that was radically different in a 
qualitative sense from those of other people. She found, how¬ 
ever, that the type of social and sexual adaptation was of a 
markedly non-aggressive sort and hence rather more “feminine” 
than “masculine” according to our cultural stereotypes. 

The object of the present study was to investigate differences 
in personality traits, as measured by the nine scales of the 
Minnesota Multiphasic Personality Inventory, between art stu¬ 
dents and non-art students matched with them on age and 
intelligence. 

Population 

Art Students —The subjects, all volunteers, were 50 male 
art students, age 18 or above, who had attended a recognized 



personality traits of art students 


287 


art school in New York City (excluding commercial-art schools) 
for at least two years, and who intended to make art work their 
vocation. 

Control Group of Non-Art Students —The control group was 
composed of 50 male subjects who were not art students, and 
were selected randomly from the general population in New 
York City, and in Rockland and Orange Counties of New York 
State. Some of the occupations included were hospital atten¬ 
dant, automobile mechanic, electrician, shoemaker, chauffeur, 
teacher, accountant, and graduate student. 

TABLE 1 

Comparison aj Minnesota Mttltiphasic Personality Inventory Results Obtained on 50 
Art Students and 5 ° Non-Art-Student Controls Matched on Age and Otis 


D u 

Art Student 
Mean — 

Art Students Controls Control 

Variable Mean SD Mean SD Mean t ratio 


Age (years last birthday). 24.64 6.16 24.62 5.47+ .02 — — 

Otis IQ.111.56 10.49 112.68 10.82 — .33 — — 

Hypochondriasis. 52.56 10.06 51.66 9.00 + .90 1,93 .47 

Depression. 56.74 10.79 53.16 5.28 + 3-58 1.66 2.15* 

Hysteria. 59.24 7.91 57.74 7.43 + 1.50 1.51 -99 x 

Psychopathic Deviate. 59.H 13.40 50.28 4.96 4 - 8.86 1.97 4.49T 

Interest. 70.10 11.S2 55.92 5.86 +14.18 1.867.62]- 

Paranoia. 54.00 7.17 47.18 5.91 + 6.82 1.36 5.0IT 

Psychasthenia. 53.88 10.16 50.45 6.54 + 3.43 1.67 2.05* 

Schizophrenia. 55,90 10.74 49 - 4*5 4.71 + 6.44 1.61 4.oot 

Hypomania. 61.38 10.93 53.31 5.56 + 8.06 1.72 4.69b 


* Significant at the 5% level of confidence, 
f Significant at the 1% level of confidence. 


Testing was conducted at the Psychology Laboratory of New 
York University between July, 1947, and July, 1948. 

Each art student was matched with a control on the basis of 
chronological age and Otis IQ (25): within 3 points on age and 
5 points on IQ. The mean age for Art Students and Controls 
was 24.6; the sigma on age was 6.2 for Art Students and 5.5 
for Control. On Otis IQ., the mean and sigma for Art Students 
were in.6 and 10.5; the Otis IQ. mean and sigma for Controls 
were 112.7 and 10.8. 

Procedure 

Raw scores for the nine scales of the Minnesota Multiphasic 
Personality Inventory (10, n, 12, 13, 14) were obtained for. 














288 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

each subject. Standard score equivalents, or T-scores, were 
determined, full account being taken of the supplementary- 
scores, that is, the Lie, Question, and Validity scores. 

On each of the nine Multiphasic scales the difference in 
standard score was obtained for each art student and his non¬ 
art-student control matched for age and IQ. These T-score 
differences were distributed and the mean and sigma of each 
distribution of differences obtained. 

An estimate of the standard error of the mean of each set of 
differences was then computed by dividing the standard devia¬ 
tion of each distribution of differences by the square root of 
N-i, thus allowing for the correlation in scores for the two 
groups introduced by the matching. A t-ratio was then com¬ 
puted for each variable. 

Results and Discussion 

Table r gives all pertinent data. As can be seen from this 
table, the art students were significantly higher than the con¬ 
trols in mean scores on the Depression, Psychopathic Deviate, 
Interest, Paranoia, Psychasthenia, Schizophrenia, and Hypo- 
mania Scales of the Minnesota Multiphasic Personality Inven¬ 
tory. These differences were significant at the one per cent 
level for all scales mentioned except the Psychasthenia and 
Depression Scales, where the differences were significant at the 
5 per cent level. 

If we can safely generalize from the findings of the present 
paper, these results suggest that the art student, as compared 
to the non-art student of similar age and intelligence, is more 
typically introverted, exhibits a greater tendency toward de¬ 
pression, possesses a tendency to disregard social mores or an 
inability to adjust to the outer world, and is more feminine 
in his basic interest pattern. Further, he tends toward over¬ 
productivity in thought and action, these being of unusual 
character, and also toward compulsive behavior. 

Several factors must, however, be considered in interpreting 
these findings. Concerning the Interest Scale, on which the art 
students were found to score significantly higher, we must take 
heed of the caution by the authors that homosexuality must 



PERSONALITY TRAITS OF ART STUDENTS 


2,89 


not be assumed on the basis of a high score without confirmatory 
evidence, owing to the relatively low reliability of this scale. 
Burton (1) administered the Interest Scale to 20 rapists, 34 
sexual inverts and 84 other delinquents. Although he found 
significant differences between inverts and rapists, and also 
between inverts and delinquents who were sexually normal, 
on retest of 34 cases the reliability coefficient was found to be 
only .70. 

The fact that Interest scores are related to cultural factors 
(31) must also be taken into account in interpreting the In¬ 
terest findings. Roe, for example, in the study previously men¬ 
tioned (28, 29), interprets the “feminine” type of sexual adap¬ 
tation of a group of male artists as reflecting the attempt on the 
part of our society to maintain one acceptable male stereo- 
type. 

The high mean on the Paranoia Scale would seem, however, 
to add weight to the significance of the high mean on the In¬ 
terest Scale, in light of current psychoanalytic theory which 
stresses the partial failure of repression of homosexual tenden¬ 
cies in the psychogenesis of paranoia (4). Ferenczi (5) goes so 
far as to consider paranoia as distorted homosexuality. Hender¬ 
son and Gillespie (13), however, describe eleven cases of par¬ 
anoia in only four of which the etiology of the paranoia was in 
agreement with Freudian conceptions. They claim that the 
causation of paranoid conditions is probably not by any means 
uniform, but that type of personality is one of the commonest 
predisposing elements. The sensitive, introverted individual, 
such as was found common in the art-student group, is men¬ 
tioned as one of the types particularly susceptible to paranoia. 

The significantly high scores of the art students on the 
Schizophrenia and Psychasthenia Scales appear readily inter¬ 
pretable. It would seem likely that by virtue of his higher 
“cultural” level, the art student encounters difficulty in ad¬ 
justing to the outer world and finds it psychologically necessary 
to turn inward, appearing introverted, and giving rise to the 
high Schizophrenia mean. The tendency toward compulsive 
behavior, shown by the art students on the Psychastenia Scale, 
is due in part to the overlapping of items and the high corre- 



ig O EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

lation between the Schizophrenia and the Psychasthenia Scales 
(.84 for normals, .75 for abnormal cases). It may also reflect 
a real tendency toward compulsive behavior on the part of the 
art-student group. 

The high mean score for the art students on the Hypomania 
Scale may appear inconsistent with the high mean for this 
group on the Depression Scale, and with the introvertive pat¬ 
tern which seems to typify the group. It must be remembered, 
however, that the Hypomania Scale presumably measures over¬ 
productivity of thought as well as action. It seems plausible 
that the high Hypomania mean for the art students is account¬ 
able in terms of overproductivity of thought ; that because of 
their introvertive tendency they express these thoughts in sym¬ 
bolic forms rather than in action in the ordinary sense of the 
word. 

The relatively high mean score of the art-student group on 
the Psychopathic Deviate Scale is contrary to expectations, in 
light of the introvertive pattern manifested for this group, 
since behavior of the psychopathic deviate variety is usually 
associated with extroversive tendencies. The relatively high 
Psychopathic Deviate mean score is, however, consistent with 
the ordinary layman’s stereotype of the artist. It must also 
be considered that while the Multiphasic appears adequate for 
giving a general over-all pattern of group behavior, it loses 
in validity when an attempt is made to interpret findings on any 
given scale taken in isolation. 

Further caution is prescribed in interpreting the results dis¬ 
cussed here. Owing to the preliminary status of some of the 
Multiphasic Scales, the overlapping of items among the various 
scales, the lack of experimental determination of reliabilities, 
the Multiphasic is still in an incomplete state of develop¬ 
ment. 

Note must also be taken of the limitations of the present 
study with respect to sampling. The art students were all from 
professional art schools in New York City and cannot be taken 
to represent art students throughout the country. The number 
of cases, while sufficient to yield statistically significant dif¬ 
ferences for many of the comparisons made, is also very small, 



PERSONALITY TRAITS OF ART STUDENTS 2gl 

in an absolute sense; the differences, therefore, while signifi¬ 
cant, are not highly reliable. 

Summary 

Differences in personality traits between art students and 
non-art students matched with them on age and intelligence, 
as measured by the nine scales of the Minnesota Multiphasic 
Personality Inventory , have been investigated. The findings 
reveal significantly higher mean scores for the art students on 
the Depression, Psychopathic Deviate, Interest, Paranoia, 
Psychasthenia, Schizophrenia and Hypomania Scales of the 
Multiphasic. These findings seem, on the whole, to be psycho¬ 
logically meaningful. 

Owing to the selective character of the sample used and to 
the inadequacies of the Multiphasic as a tool for personality 
diagnosis, caution is indicated in interpreting these results. 

Further study of the personality characteristics of different 
vocational and social groups is recommended. On the basis of 
the present findings, it would seem that investigations along 
such lines can afford material aid to the understanding of 
various social problems. 

REFERENCES 

1. Burton, A. “The Use of the Masculinity-Femininity Scale of 

the Minnesota Multiphasic Personality Inventory as an 

Aid in the Diagnosis of Sexual Inversion.” journal of Psy¬ 
chology, XXIV (1947), 161-164. 

2. Carroll, H. A. “A Preliminary Report on a Study of the Rela¬ 

tionship Between Ability in Art and Certain Personality 

Traits.” School and Society, XXXVI (1932.), 285-288. 

3. Carroll, H. A. and Eurich, A. C. “Abstract Intelligence and 

Art Appreciation.” Journal of Educational Psychology , 

XXXIII (1932) 214-220. 

4. Fenichel, O, The Psychoanalytic Theory of Neurosis. New York: 

Norton and Company, 1945. 

5. Ferenczi, S. Contributions to Psycho-Analysis. Boston: Gotham 

Press, 1916. 

6. Fleming, E. G. “Personality and Artistic Talent.” Journal of 

Educational Sociology, VIII (1934), 27-33. 

7. Friedlander, M, I. “An Art Expert’s Observations on Person¬ 

ality.” Character and Personality , I (1932), 75—7S. 

8. Hathaway, S. R. Supplementary Manual for the MMPI. Part I. 

The K Scale and its Use. Part II. The Booklet Form of the 

MMPI. New York: Psychological Corporation, 1946. 



lyl EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

9. Hathaway, S. R. and McKinley, J. C. "A Multiphasic Personal¬ 
ity Schedule (Minnesota): I. Construction of the Schedule.” 
Journal of Psychology, X (1940), 249-254. 

10. Hathaway, S. R. and McKinley, J. C. “A Multiphasic Person¬ 

ality Schedule (Minnesota): II. A Differential Study of 
Hypochondriasis.” Journal of Psychology , X (1940) 255- 
268. 

11. Hathaway, S. R. and McKinley, J. C. “A Multiphasic Person¬ 

ality Schedule (Minnesota): III. The Measurement of Symp¬ 
tomatic Depression.” Journal of Psychology, XIV (1942), 
73-84- , 

12. Hathaway, S. R., and McKinley, J. C. Manual for the Minnesota 

Multiphasic Personality Inventory. New York: Psychologi¬ 
cal Corporation, 1945. 

13. Henderson, D. K. and Gillespie, R. D. A Textbook of Psychiatry . 

London: Oxford University Press, 1946, 

14. Hurlock, E. B. and Thompson, J. L. “Children’s Drawings: An 

Experimental Study of Perception.” Child Development, V 
(1934), 127-138. _ 

15. Liss, E. “The Graphic Arts.” The American Journal of Ortho¬ 

psychiatry, IX (1938) 9^-99. 

16. Lowenfield, V. The Nature of Creative Activity. (Translated by 

0. A. Oeser.) New York: Harcourt and Brace, 1939, 27a 
PP- 

17. Markey, F. V. “Imaginative Behavior of Young Children.” 

Child Development Monograph, No. 18, 1935. 

18. McCloy, W. “Creative Imagination in Children and Adults.” 

Psychological Monograph, LI (1939), 88-103. 

19. McCloy, W. and Meier, N. C. “Re-Creative Imagination.” Psy¬ 

chological Monograph, LI (1939), 108-116. 

20. McKinley, J. C. and Hathaway, S. R. “A Multiphasic Person¬ 

ality Schedule (Minnesota): IV, Psychasthenia.” Journal 
of Applied Psychology , V (1947) 614-624. 

21. McKinley, J. C. and Hathaway, S. R. “The Minnesota Multi¬ 

phasic Personality Inventory. V. Hysteria, Hypomania, and 
Psychopathic Deviate.” Journal of Applied Psychology, 
XXVIII (1944), 153-174. 

22. McKinley, J. C. and Hathaway, S. R. “The Identification and 

Measurement of the Psychoneuroses in Medical Practice.” 
Journal of the American Medical Association, CXXII (1943), 
161-167. 

23. Meier, N. C. “Recent Research in the Psychology of Art.” Year¬ 

book of the National Society for the Study of Education , XL 
. (i94i), 379-4°9- 

24. Meier, N. C. “Special Artistic Talents.” Psychological Bulletin , 

XXV (1928), 265-271. 

25. Otis, A. S. Statistical Methods in Educational Measurements. 

New York: World Book Co., 1925, 

26. Prados, M. “Rorschach Studies on Artists-Painters.” Rorschach 

Research Exchange , VIII (1944), 178—t8j. 



PERSONALITY TRAITS OF ART STUDENTS M )2 

27. Rank, 0 . Art and Artists. New York: Tudor Publishing Co. 

28. Roe, A. “Painting and Personality.” Rorschach Research Ex¬ 

change , XL (1946) 86~ioo. 

29. Roe, A. “The Personality of Artists.” Educational and Psy¬ 

chological Measurements, YI (1946) 401-408. 

30. Schiele, B. C., Baker, A. B. and Hathaway, S, R. “The Minn¬ 

esota Multiphasic Personality Inventory.” The Journal- 
Lancet, LXIII (1943), 292-297. 

31. Terman, L. M. and Miles, C. C. Sex and Personality. Studies in 

Masculinity and Femininity. New York: McGraw-Hill, 1936. 

32. Tiebout, C. and Meier, N. C. “Artistic Ability and General 

Intelligence.” Psychological Monograph, XL (1936), 95-125. 



THE KNOWLEDGE OF GENERAL EDUCATION OF A 
SAMPLE OF SYRACUSE UNIVERSITY STUDENTS AS 
REVEALED BY THE COOPERATIVE GENERAL CUL¬ 
TURE TEST AND THE TIME MAGAZINE CURRENT 
AFFAIRS TEST. 


N. M. DOWNIE 
The State College of Washington 

M. E. TROYER and C. R. PACE 
Syracuse University 

During the academic year, 1947-1948, Syracuse University 
initiated an all-university self-survey, the results of which were 
to provide the bases for enlightened planning for the years 
ahead. Among the concerns of the various survey committees 
was an investigation of the program of general education of 
the University. 

As a part of this study of general education, a sampling of 
seniors, members of the Class of 1948, and of sophomores, mem¬ 
bers of the Class of 1950, were given Form X of the Coopera¬ 
tive General Culture ‘Test and the September, 1947, edition of the 
Time Magazine Current Afairs Test. These tests were admin¬ 
istered late in December of 1947 and during the first school 
days of January, 1948. The following five Colleges of the Uni¬ 
versity had students participating in the program: Applied 
Science, Business Administration, Fine Arts, Flome Economics 
and Liberal Arts. 

Raw test scores on the Ohio Psychological Examination, Form 
2i, were obtained for as many students as possible. Mean scores 
on this test were computed for each college and class. These 
means were tested for significant differences by means of the 
“t” test and the homogeneity of their variances by the “F” 
test. No significant difference was found between the mean 
score for all of the seniors and the mean score for all of the 
sophomores. When th.e mean score for each college was com- 


*94 



KNOWLEDGE OF GENERAL EDUCATION 2 g§ 

pared with the mean of the total seniors and with that of the 
total sophomores, the only significant difference was found to 
be that the mean score of the Liberal Arts sophomores was sig¬ 
nificantly higher than that of all other sophomores. 

An analysis of covariance technique was applied to the data 
to determine whether, if intelligence test scores were held con¬ 
stant, there was any difference between the total scores of the 
seniors and sophomores on the Cooperative General Culture Test. 
An “F” ratio of .784 was obtained. This led to the acceptance 
of the null hypothesis that there was no significant difference 
between the means of the two classes on the total scores of this 
test. The Welch-Nayer Test was used to check the assumption 
of homogeneity of variances of the two groups. The variances 
were found to be homogeneous. 

When the test was analyzed by subtests and for the differ¬ 
ent classes in the five colleges, numerous significant differences 
appeared as shown in Table I. The “t" test was used to test the 
significance of the differences between the means. Variances 
of each set of means were compared, using the “F” ratio. It was 
found in two cases where significant “t’s” appeared—Current 
Social Problems, for the Applied Science and Home Economics 
students—that the real difference was caused by the variances 
of the two distributions. 

Table 1 shows the mean total and part scores by college and 
class on this test. On studying this table, one sees that, in gen¬ 
eral, the students of Syracuse University achieved well above 
the mean on national norms for college sophomores. As a matter 
of fact, of the eighty-four mean scores reported in this table, 
only ten are below the mean on national sophomore norms. 

On studying the six part-scores of the test for each college, 
one sees that in the College of Liberal Arts both seniors and 
sophomores were well above the all-university mean for each 
part, with the seniors and sophomores significantly different 
from it on Current Social Problems, History and Social Studies 
and Literature, and the seniors in Science. The seniors in Ap¬ 
plied Science achieved significantly above the all-university 
mean'in Science, Mathematics and Current Social Problems, 
and significantly below it in Literature and Fine Arts. The 
Applied Science sophomores were above the mean in Science 



2.96 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

and Mathematics (significantly so), while on the other areas of 
the test, they approximated it. 

The seniors in the College of Business Administration fell 
significantly below the mean in Literature, Science and Fine 
Arts and hovered around it in other areas of the test. The soph- 


TABLE 1 

Mean Total and Part Scores by College and Class on the Cooperative General 

Culture Test 


#■ 

College and 
Class 

N 

Current 

Social 

Problems 

History 

and 

Social 

Studies 

Litera¬ 

ture 

Science 

Fine 

Arts 

Mathe¬ 

matics 

Total 

L. A— 1948. 

. no 

146.1 

t 49 -i 

.1 

h 43 ■ 8 

133-4 

45.0 

24-5 1 

'242.0 

L. A. —1950. 

108 

t 44 .a 

t 47-7 

1138-7 

3 i -4 

§40-0 

26.8 1 

[-228.8 

A. S.—1948. 

40 

*46-5 


44-3 

I28.7 

139-4 

I37-0 

144-2 

r240.0 

A. S.—1950. 

57 

4 a -5 


45-5 


3°-5 

140-8 

35-7 

145-7 

[■240.4 

B. A. —1948. 

• 77 

44.1 


43-7 

1 

'30.1 

125-1 

137-3 

23.7 *203.3 

B. A. —1950. 

6a 

t 44-7 


43-8 


28.4 

I26.5 

1132-1 

*24.2 

199.9 

F. A. —1948. 

■ 71 

t 35-9 

1 

'36-1 


36.0 

121-9 

mm 

IS 


F. A. —19 jo. 

€6 

t 35 -i 


134-8 

431 -I 

121-9 

Ok &£ 

lagi 

JuS| 

H. E.— 1948 . 

■ 37 

139-3 

t3fi-i 

128.7 

27-7 

39-5 

tl3,6 fi84.9 

H. E.—igjo. 

• 29 

134-1 

t36-9 


32-5 

31.0 

38.7 

t I 7-3 d90-5 

Total —1948 . 

• 335 

42.8 


43-i 


35-5 

29.1 

43-5 

23.2 

217.1 

Total —19 jo. 

■ 3 11 

t4i.a 


42-9 

133-2 

30.1 

§38.8 

§26.5 

212.7 

National||. 

.. 8500 

33'7 


34-3 


3°-9 

24-4 

3i-9 

17.2 

172.5 


* Significant difference between college mean and all-university mean for this 
class— 5 % level. 

f Significant difference between college mean and all-university mean for this 
class— 1 % level. 

t Significant difference between seniors and sophomores in the same college— 5 % 
level. 

§ Significant difference between seniors and sophomores in the same college— 1 % 
level. 

|| Based on a random sampling of approximately 8500 sophomores from those 
colleges whose reports were received before April 7 —as reported in the 1947 National 
College Sophomore Testing Program, Cooperative Test Service, May 1947 . 


omores in the same college were significantly below the mean 
in the same three areas plus Mathematics and significantly 
above it in Current Social Problems. 

Both classes in the College of Fine Arts were significantly 
below the all-university mean on Current Social Problems, 
History and Social Studies, Science and Mathematics, around 
the mean in Literature and significantly above it in Fine Arts. 






















KNOWLEDGE OF GENERAL EDUCATION 


297 


In the College of Horae Economics, both classes were signifi¬ 
cantly below the all-university mean on Current Social Prob¬ 
lems, History and Social Studies and Mathematics, the seniors 
significantly below it in Literature and both classes around the 
mean in the other areas. 

When seniors and sophomores were compared, a few inter¬ 
class differences appeared. The sophomores as a group scored 
significantly higher than the seniors in Mathematics and lower 
in Fine Arts, Literature and Current Social Problems. In the 
Colleges of Liberal Arts and Fine Arts, the seniors were signifi¬ 
cantly higher in Literature and Fine Arts and the Business 
Administration seniors in Fine Arts. 

An item analysis of this test was made, using all of the papers 
to determine the percentage of students in each class of the 
five colleges who responded to each item correctly. On Part I, 
Current Social Problems, the students as a whole did rather 
well. They were best informed on items concerned with labor 
unions and labor activities. Other attempts to classify the items 
failed to show any particular area that was either very good or 
very poor. 

Some of the Current Social Problems items, on which the 
students did rather poorly, are listed below. In selecting from 
teachers, farmers, industrial workers, white-collar workers and 
civil service employees, the group least likely to suffer from in¬ 
flation, less than 50 per cent of the students chose the correct 
answer. An item which called for the knowledge that the doc¬ 
trine of states’ rights was used as an argument against 
federal antilynching legislation was known by less than 30 
per cent of the students. Two other items, the period of life 
when the greatest incidence of tuberculosis occurs and the 
meaning of the term “Nisei” were likewise unknown to 50 per 
cent of the students. Another item which called for a definition 
of “nationalization of industry” was answered correctly by 
less than 20 per cent of the students. 

A comparison of the results of the item analysis of Part II 
of this test, History and Social Studies, showed that on this 
part of the test, as on Part I, the students as a whole did quite 
well. As might be expected, items concerned with American 
history were easier than those related to European or Asiatic 



298 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

history. Items related to psychology were answered very well 
by all of the students. 

Some interesting things appeared when individual items were 
studied. On one item, the student selected from the following— 
aristocratic, autonomous, autocratic, autarchic and anarchistic 
—the one that best described a government controlled by the 
will of one man. About 50 per cent of the students answered 
this item correctly. The item which asked whether the elec¬ 
tion of Harding was a repudiation of the Republican Party, 
Ku Klux Klan, Roman Catholic Church, capitalist system or 
the League of Nations was likewise missed by about one-half 
of the students. The concept of "Balance of Power” was also 
unknown to about the same number of students. Perhaps one 
of the most elementary things missed on this part of the test 
was the type of government in existence in Switzerland, 
Twenty-five per cent of the students answered this item in¬ 
correctly. 

On Part III of the test, Literature, a study of the item analy¬ 
sis showed that, in general, the students did poorly. Many 
students omitted a large number of items. There was evidence, 
however, that most of the items were attempted by most of 
the students because several items toward the end of the sec¬ 
tion were answered correctly by nearly every student. 

An attempt was made to see if there were specific areas such 
as American literature, English literature, poetry or drama in 
which the students did better than in others. A comparison of 
the items related to American literature with those concerned 
with English literature showed that the students did about the 
same in both areas. Results on items related to Graeco-Roman 
literature were quite poor for all groups. When the items were 
studied as to whether they pertained to poetry, drama, ex¬ 
position, etc., no evidence was found to show that the students 
did better in one of these areas than in another. 

One rather interesting thing did appear from the item analy¬ 
sis. Included in the ninety items on literature were ten items 
which referred to Biblical characters or situations. Of these 
ten items, the students answered only two of them well. Prac¬ 
tically everyone knew that Samson was distinguished for his 
strength and that the walls of Jericho came down on the blow- 



KNOWLEDGE OF GENERAL EDUCATION 2.99 

mg of trumpets. Less than a quarter of the students in all of the 
colleges knew Lazarus as a beggar. About 35 per cent of all 
students were aware that St. Paul was converted to Christianity 
on the road to Damascus and the same percentage knew that 
Lot was rescued from the destruction of his city. The hand¬ 
writing on the wall was recognized as occurring in the Court 
of Belshazzar by about 30 per cent of the students; the father 
and son relationship of David and Solomon was known by about 
35 per cent of the students; and the fact that Joseph was sold 
as a slave to the Egyptians was common knowledge to only 
about 55 per cent of the students. On all of these items, the 
Liberal Arts students did only slightly better than the students 
in other colleges. 

On the recognition of authors, the item analysis revealed 
that the students were very well acquainted with O. Henry, 
Pearl Buck, Rudyard Kipling, Robert L. Stevenson, Washing¬ 
ton Irving and Booth Tarkington; but the following authors, 
whose writings are more difficult and more provocative of 
thought, were quite unfamiliar to the majority of students— 
Thomas Mann, Sholem Asch, Andre Maurois, Thomas Wolfe 
and Aldous Huxley. 

Several members of the English Department looked over this 
part of the test in order to judge whether each item was some¬ 
thing that a generally educated person should know. Of the 
ninety items, only seven were considered as being of too tech¬ 
nical a nature. 

A study of the item analysis of Part IV, Science, showed that, 
except for the College of Applied Science, students were rather 
poorly informed about general science. Of the sixty items com¬ 
posing this part of the test, only three stood out as being known 
by almost all of the students. These items were concerned with 
the recognition of the metallic element found in the red color¬ 
ing matter of blood, the tarnishing of silver as an example of 
oxidation, and the major purpose of scientific investigation. 
One item which asked the students to select from the following 
the one that is not a science—organic chemistry, astronomy, 
bacteriology, geology and astrology—was answered correctly 
by as few as 60 per cent of the seniors in the College of Business 
Administration. Sophomores in the same college and students. 



300 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

in both classes of the Colleges of Fine Arts and Home Economics 
did only slightly better on the same item. An item concerned 
with the structure in animals that produces eggs was missed 
by almost 20 per cent of all students. The name of the gas 
given off by a poorly damped furnace was unknown to 50 to 
25 per cent of the students in all of the colleges except Applied 
Science. Two items related to the use of the scientific method 
were rather well done by all of the students except those in the 
College of Fine Arts. An item on the cause of the formation of 
dew at night and one on the time zones of the United States 
were responded to correctly by about 60 per cent of all students. 
The law of moments was applied correctly to a problem by 50 
per cent of the Liberal Arts students and by from 40 to 50 
per cent of those in Business Administration, Fine Arts and 
Home Economics. The item which stated that osmosis is a 
process of—oxidation, diffusion, absorption, reduction, or mag¬ 
netic attraction—was answered correctly by about 45 per cent 
of the Liberal Arts students, 40 per cent of the engineers, 20 
per cent of the Business Administration students, 15 per cent 
of the Fine Arts students and 60 per cent of those in Home 
Economics. 

In the groups other than engineering, less than 50 per cent 
knew the use made of a carpenter’s level. The traditional story 
of the bees and the birds and the flowers would have misfired 
with these students as less than a quarter of them knew about 
pollination. The general characteristics of man as a vertebrate 
likewise were rather obscure to these students, with less than 
half of them able to select from fish, sponge, oyster, lobster 
and insect, the animal that is most similar in structure to man. 

Even the Applied Science students, who scored as a group 
high on this part of the test, showed that they were not gen¬ 
erally educated in the area of science. A study of the item 
analysis revealed the specificity of their knowledge. In general, 
they did excellently on items concerned with mechanics, heat, 
light, sound and electricity, but, on items concerned with biol¬ 
ogy and geology, they did no better than the students in the 
other colleges. Several of the items concerned with the wear¬ 
ing of white and woolen clothing showed that the engineers 
transferred their knowledge of heat and light rather poorly to 



KNOWLEDGE OF GENERAL EDUCATION 


3°! 


actual life situations, with about 30 per cent of the students 
missing the items. 

Four members of the faculty, one each in the areas of bac¬ 
teriology, chemistry, physics and zoology, were asked to look 
over the items on this part of the test and to consider them in 
the same manner as the members of the English Department 
were asked to treat the Literature items. The group as a whole 
thought that this part of the test was a rather good test of 
general education in the area of science. A half dozen of items 
were considered to be too specific to be included in a test of 
general education. 

The item analysis of Part V of the test, Fine Arts, showed 
that on the whole the students were rather poorly informed in 
the various areas of the fine arts, except for students in the 
College of Fine Arts. But even in this group, unexpectedly poor 
results showed for many of the items. 

Some of the more interesting results are noted below. Prac¬ 
tically all of the students located the Hanging Gardens as 
having been in Babylon, but only from one-half to three- 
quarters of all the students in all colleges knew that Serge 
Koussevitsky was an orchestra conductor. Items concerned 
with contemporary fine arts were poorly answered. For ex¬ 
ample, 40 per cent or less of all students (except Fine Arts 
seniors, 59 per cent) recognized Thomas Benton as a contem¬ 
porary American painter and less than a third of all the stu¬ 
dents identified Jacob Epstein as a modern sculptor. Salvador 
Dali fared a little better with from 50 to 80 per cent of the stu¬ 
dents recognizing an outstanding characteristic of his work. 
In music the situation was about the same. Sixty per cent or 
less of the students knew who wrote the Stalingrad Symphony 
and even fewer recognized the composers of Oklahoma. 

Three members of the faculty of the College of Fine Arts 
went over the ninety items of this part in the same manner 
as faculty members treated the other areas. With a few excep¬ 
tions, most of the items were thought to be concerned with 
things that a “generally educated” person should know in the 
area of Fine Arts. 

The item analysis of Part VI, Mathematics, showed that this 
was the most difficult part of the test. The students’ papers 



302 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

were studied to see just how far the various groups went through 
the test. Students in both classes of the College of Applied 
Science attempted nearly all of the items. The median last 
item attempted was fifty-six for both classes of the College of 
Liberal Arts. (There are sixty items on this part of the test.) 
For students in the College of Business Administration, the 
median last item attempted was fifty-six for the seniors and 
fifty-nine for the sophomores. In Fine Arts this median dropped 
to forty-five for the seniors and to fifty-two for the sophomores, 
and in the College of Home Economics, the median was fifty- 
one for the seniors and fifty-eight for the sophomores. 

A study of this item analysis showed that most students 
could perform the simpler arithmetical and algebraic opera¬ 
tions. An item concerned with the extra cost involved when 
articles are purchased on the installment plan showed that 
2,5 per cent of the students had no idea how to figure correctly 
such a common every-day problem. Forty-five per cent of the 
students were able to compute the annual interest on a short¬ 
term loan. A simple question involving buying and selling was 
solved correctly by about 25 per cent of the students outside 
of the College of Applied Science. Sixty-five per cent of the 
Applied Science students solved the problem correctly. A prob¬ 
lem in thinking with symbols—how many minutes are there in 
“p” days—was also difficult, for 30 per cent or more of the 
students, other than engineers, could not solve it. The concept 
of a converse and the ability to state one was also missed by 
more than one-half of the students. Similarly the concept of an 
axiom was unknown to about 70 per cent of the students. 

Course programs of a sample of thirty students, members of 
the Class of 1947, were analyzed to determine the number of 
hours the students carried in the various areas of general edu¬ 
cation. On the basis of this analysis and on the study of var¬ 
ious course patterns as stated in the catalogs of the different 
colleges of the University, estimates were made of the number 
of credit hours the students in the different colleges carried in 
various areas of general education. (Estimates of the general 
education courses of Liberal Arts students were made entirely 
from the catalog.) The discussion which follows covers the 
five areas of general education included in the Cooperative 
General Culture Test. 



KNOWLEDGE OF GENERAL EDUCATION 


3°3 


The five colleges were ranked on the basis of the amount of 
course work in each of the several areas of general education 
received by the students in each college. The mean scores of 
the students in the five colleges on the six parts of the test were 
also placed in rank order. A comparison of the rank order of the 
number of courses taken and of the mean scores on the six 
parts of the General Culture Test showed that there was in gen¬ 
eral a rather close similarity between the rank order of the num¬ 
ber of hours taken in an area of general education and the rank 
order of the mean score of the students in the five colleges on 
the part of the General Culture Test related to that area. The 
area which deviated most from this was the social studies. 
Here the Applied Science seniors, who ranked fourth in courses 
taken in this area, tied for first place with the Liberal Arts 
seniors who ranked first in the number of courses taken. In the 
Sophomore Class, the Business Administration students, who 
ranked second in the number of courses taken, tied with the 
Liberal Arts students, rank one in courses taken, for first place. 
On the History and Social Studies part of the General Culture 
Test, both classes of the College of Applied Science, rank four 
in the number of courses taken, ranked second in their mean 
scores on the test. 

In the area of Literature, the Business Administration seniors, 
who were tied for lowest place in the number of hours taken in 
this area, were placed in third position with their mean score on 
the culture test. The scores of both classes of the other colleges 
ranked the same as the amount of literature studied with minor 
variations. In the area of Science there was almost a perfect 
relationship between the number of courses taken in science 
and their mean scores on this part of the test. 

On the Fine Arts part of the test, both of the Liberal Arts 
■ classes, which ranked third in the number of courses taken, 
changed places with Home Economics, rank two. A similar 
switch occurred in Mathematics, in which both classes of the 
Liberal Arts College, rank three in courses taken, changed 
places with Business Administration, rank two in courses taken, 
on the rank of the mean score on this part of the Cooperative 
General Culture Test. 

The Time Magazine Current Affairs Test .—This test was 
administered as an untimed test and the students were not 



3O4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

required to put their names on their papers. The test, as it 
was made up, consisted of eight parts: U. S. Affairs, Map, 
International, Foreign News, Canada, Science, The Arts and 
Personalities. However, in scoring the test, four of these parts 
—Map, International, Foreign News and Canada were com¬ 
bined into one part which was named “World Affairs.” This 
was done chiefly because of the small number of items in each 
of these four parts of the test. 

Mean-total and part scores for the two classes in each of the 
five colleges were computed. No significant differences were 
found between the seniors and the sophomores for the Uni¬ 
versity as a whole on the total scores and sub-test scores. (The 
statistical techniques used here were the same as used in com¬ 
paring the results of the Cooperative General Culture 'Test). 
When mean-total scores of each college and class were com¬ 
pared with the all-university mean for each class, it was found 
that the seniors in the College of Applied Science were signifi¬ 
cantly above the mean (5 per cent level), the sophomores in the 
College of Business Administration in a similar situation, and 
that both classes of the Colleges of Fine Arts and Home Eco¬ 
nomics were signficantly below the mean (1 per cent level). 

When the mean scores of parts of the test were analyzed, 
it was found that both classes of the College of Fine Arts and 
Home Economics were significantly below the mean on most 
parts of this test. The exceptions were that both classes of Fine 
Arts approximated the mean on the part entitled “The Arts” 
and the sophomores in Home Economics were below the mean, 
but not significantly so in Science and The Arts. The Applied 
Science seniors and Business Administration sophomores were 
above the mean on U. S. Affairs (5 per cent level). In World 
Affairs, the Liberal Arts seniors were significantly above the 
mean (5 per cent level). In Science both classes of the engineer¬ 
ing school were significantly above the mean at the 5 per cent 
level. The sophomores in the same college were significantly 
below the mean in The Arts (5 per cent level). In Personalities 
both Liberal Arts seniors and the two classes of Business Ad¬ 
ministration were significantly above the mean. 

When results for the two classes in the same college were 
compared, the only difference between seniors and sophomores 



KNOWLEDGE OF GENERAL EDUCATION 305 

appeared in the College of Home Economics where the seniors 
did significantly better on U. S. Affairs and World Affairs 
(both 1 per cent level) and were higher on their total scores than 
the sophomores (5 per cent level). 

A comparison of the rank order of the mean scores of the 
different colleges on the various parts of this test with the 
number of courses taken in an area showed results similar to 
those obtained when scores on the Cooperative General Culture 
‘Test were compared with number of courses taken. The Ap¬ 
plied Science students similarly scored high on the social studies 
parts of this test. The seniors ranked first on U. S. Affairs and 
second on World Affairs and the sophomores second on U. S. 
Affairs and first in World Affairs. In the number of social studies 
courses taken, these engineering students ranked fourth. 

Summary of Findings 

1. On the Cooperative General Culture Test , Syracuse Univer¬ 
sity students ranked high according to national standards. 
Converting the mean scores of Table 1 into percentile scores 
placed the mean-total score of the seniors at the 78th percentile 
and of the sophomores at the 76th percentile on national norms. 
The average total score of students in the five colleges was well 
above the national average in all cases and as high as the top 
11 per cent in the best case. These rather high mean percentile 
scores are due in part to the high scores the students made in 
their special areas of study and are not a reflection of a well- 
balanced program of general education. In the areas of the 
test related to the students’ field of specialization, the scores 
averaged from the 75th to the 95th percentiles, but in the areas 
outside of the students’ major fields the scores averaged from 
the 50th to the 70th percentiles, 

1. When total scores on the Cooperative General Culture Test 
were compared, it was found that students in both classes of 
the Colleges of Liberal Arts and Applied Science scored signif¬ 
icantly above the all-university mean. Seniors in the College 
of Business Administration and both classes of Fine Arts and 
Home Economics achieved significantly below this all-univer¬ 
sity mean. 

3. Achievement in the various areas measured by this test is 



306 educational and psychological measurement 

definitely related to the amount and pattern of course work 
taken in those areas by students; and, even in the major field, 
students’ knowledge tends to be specific to course rather than 
general. Students in Applied Science scored highest on the 
Science and Mathematics parts of the test; students in Fine 
Arts scored highest on the Fine Arts part of the test, etc. When 
the high scores on a part of the test are further examined, the 
specificity of the students’ education is brought into sharper 
focus. For example, the Applied Science students did well on 
the items pertaining to physics and chemistry, but relatively 
poorly on items dealing with the biological and geological 
sciences and on items calling for practical applications of 
scientific principles to daily life. 

4. In areas of study outside of their major fields, students 
scored relatively poorly on the test. For example, Fine Arts 
students scored relatively low on Science, Mathematics, and 
on the parts of the test related to the social sciences. Applied 
Science students scored relatively low on Literature and Fine 
Arts; Business Administration students scored relatively low on 
Literature, Science and Fine Arts; Home Economics students 
scored relatively low on Literature and Mathematics. Lib¬ 
eral Arts students, on the other hand, scored relatively high on 
all parts of the test. Similarly, the engineering students scored 
relatively high in the area of the social studies. 

5. There is apparently no significant increment to general 
education during the last two years of college residence as the 
seniors scored no higher, or not significantly so, than the sopho¬ 
mores on the Cooperative General Culture ‘Test. 

6. On the ‘Time Magazine Current Affairs Test , students in 
the Colleges of Liberal Arts, Applied Science and Business Ad¬ 
ministration scored relatively high, whereas students in the 
Colleges of Fine Arts and Home Economics scored relatively 
low. The total score of the seniors on this test was not signifi¬ 
cantly higher than that of the sophomores. The typical Syra¬ 
cuse student was able to answer about half of the items on this 
test correctly. 



, THE FULL-RANGE PICTURE VOCABULARY 
TEST: II. SELECTION OF ITEMS FOR 
FINAL SCALES 1 

ROBERT B. AMMONS 
University of Louisville 
and 

LEO D, RACHIELE 
University of Denver 

Vocabulary items are among the most frequently used 
components of mental tests, They are, as a rule, relatively re¬ 
liable and valid, take little time to administer in comparison 
with their usefulness, and can be given and answered in so 
many ways that they can often be used successfully where 
other items fail, as in the case of spastic children and certain 
aphasic adults. For general clinical use, a test should not be 
dependent upon the skills of reading and writing, and should 
avoid the ambiguities inherent in administering and scoring 
items calling for definitions by the testee. On the other hand, 
the test should be, of course, as reliable and valid as possible, 
should be short and easy to administer, and should have con¬ 
siderable intrinsic interest value 

Vocabulary items make up what are probably the best single 
subtests in the 1937 revision of the Stanford-Binet (11) and 
the Wechskr-Bellevue Adult Intelligence Scale (13). Terman and 
Merrill report an average correlation of .81 for separate age 
groups between the Stanford-Binet vocabulary test score and 
the mental age on their scale as a whole, while Wechsler states 
that his vocabulary subtest correlated .85 (eta) with the total 
scale for the original standardization group. With these high 
validities in mind, a search was made by the senior author for 

1 Acknowledgment is due Professor F. Y. Billingslea, Mrs. Helen S. Ammons and 
Mr. Neil W. Coppinger o/Tulane University for reading the manuscript critically and 
offering many helpful suggestions. The test plates and a manual with final scale norms, 
answer sheets and instructions for administration (1) may be obtained from R. B. 
Ammons. 



308 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

a method of vocabulary testing which would meet the clinical 
criteria already outlined. 

The most promising technique located seemed to be that 
used by Van Alstyne (12), where a child was asked to choose 
from among four pictures on a card the one which illustrated 
a particular language concept, word, or phrase. Since this test 
had been given only a very limited standardization and sev¬ 
eral pictures were out of date, Ammons and Huth (4) set up a 
new set of 16 plates and tried out a considerable number of 
items with a small group of children. An analysis of the results 
from the try-out showed that this type of test could be given 
quickly, was useful at least through the ages of 6 to 17, and 
was highly reliable and valid. On this basis, a series of studies 
(a, 3, 5, 6, 7) was undertaken to construct and standardize a 
test for all levels of verbal ability. The present paper is the first 
in the series reporting this work. 

After a testing technique has been decided upon, at least 
three major problems present themselves to the constructor of a 
vocabulary test: (a) how to obtain items of a suitably wide 
range of difficulty, (b) how to select items of satisfactory 
representativeness of content, and (c) how to choose items valid 
for the estimation of differences in level of intellectual ability. 
It is conceivable that random sampling of all word meanings in 
a fairly large dictionary would provide a partial solution to 
these problems. Variations of this method have been used 
frequently. Seashore and Eckerson (10) selected a word from 
each left-hand page of a large dictionary, omitting prefixes, 
suffixes and abbreviations, and obtained a total of 1320 prelimi¬ 
nary items. Similarly, Atwell and Wells (8) chose 100 words 
“by chance” from a 20,000-word dictionary. The preliminary 
form of the Wechsler-Bellevue vocabulary test (13) was a list 
of 100 words, one each chosen from the top of every fifth page 
in a school dictionary, omitting “obsolete, technical, or 
esoteric words.” 

In practice, random selection of vocabulary items does not 
work out particularly well for a number of reasons. If item 
selection techniques are to be employed in the choice of a final 
scale, randomness is lost. Word meanings should probably be 
used as the original population, rather than words themselves. 



PICTURE VOCABULARY TEST 3O9 

In a multiple-choice test, the precision of meaning tested is a 
function of the alternative words used with the given item. 
Finally, if one uses the picture vocabulary technique, certain 
words cannot well be represented, and item difficulty is deter¬ 
mined to a considerable degree by the nature of the drawings 
themselves and the alternate drawings. For these reasons, in 
this test no attempt at randomness was made and our initial 
words were merely subjectively selected to be as representative 
as possible, on the basis of the pictures already available. An 
analysis of the results as presented later in this paper seems to 
justify this approach. 

The problem of representativeness of content was thus 
handled subjectively. A suitable range of difficulty was obtained 
by a choice of items after testing. Several possible alternatives 
present themselves when one wishes to select items for validity: 
suitability of material can be estimated subjectively, indi¬ 
vidual item correlations with total scale score can be used, and 
correlations of items with outside criteria such as age or mental 
test results can be computed. It will be seen later that a com¬ 
bination of all these with several more specific criteria was 
actually used. 

Problem 

The purpose of this study was to obtain a suitable group of 
vocabulary items and to set up the two final forms of a picture 
vocabulary test, based on the 16 4-picture plates developed by 
Ammons andHuth (4). To accomplish this, it was necessary to 
find a large number of words appropriate to the cards, to try 
these out on a representative population, and to select those 
items meeting the criteria established. 

Procedure 

Materials .—Item selection and testing centered around 16 
4-picture plates (1). With the plates already available, the next 
step was the discovery of a large number and variety of poten¬ 
tially good items to administer to the standardization group. To 
start with, dictionaries were checked and advanced students in 
psychology verbally associated with the plates as stimuli. 
From these sources, 243 words pertinent to the pictures were 



310 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

obtained in addition to the 48 selected by Ammons and Huth 
(4), a total of 291. Of these, 43 were eliminated because of 
obvious ambiguity, probable sex differences in experience with, 
or regional meaning, leaving 2,48 items for pretesting. 

Pretesting consisted of administering these 248 items with 
their associated plates to a small sample of children and adults 
varying widely in age and ability. 2 Four children were tested at 
each CA level 2 through 17, and four college students at each 
Wechsler IQ level 99-109, 112-119, 121-129, I 3 I_I 3 8 , and 
140-144. For children 2 through 5 results were available from 
Form L of the Stanford-Binet; those 6 through 17 were given 
the vocabulary test of Form L; while full Wechsler scale results 
were available for the college students. These estimates of 
verbal ability were later used in the ranking of items by diffi¬ 
culty for setting up the test finally given to the standardization 
group. Two males and two females were tested at all but 3 of 
the 21 levels. The college students ranged in ability as already 
noted; the 2- to 5-year-olds had Binet IQ’s between 90 and no; 
while the school children 6 to 17 years old were judged by their 
teachers as being average in intelligence. Tests were adminis¬ 
tered in the same way as outlined in the procedure section 
for the standardization group. 

After all 84 subjects had been given the appropriate intel¬ 
ligence test and the picture vocabulary test, the number of 
correct answers was tabulated for each item by age levels, and 
moving averages were calculated using five successive points. 
Per cent passing was estimated, as in Ammons and Huth’s 
study, on the basis of these moving averages between ages. 
Twenty-two more items were eliminated, either because they 
discriminated poorly between successive age levels, or because 
there were too many items passed by 50 per cent of the subjects 
at a given level. 

The resulting 226 words, including 33 remaining from Am¬ 
mons and Huth’s 48, were then listed by plates and by diffi¬ 
culty level, difficulty level being the estimated MA at the 50 
per cent passing point, in terms of the intelligence tests given. 
The items in order of difficulty were: 

J Thanks are due Mr. William L. Miller and Mr. Alvin Yoriy, principals in the 
Denver Public Schools, and Mr. Gene Gullette of Englewood High School, for 
making subjects available. 



PICTURE VOCABULARY TEST 


3 1 1 


Plate i: pie, window, dessert, vegetable, human, seed, pane, 
sill, ventilation, agriculture, anti-socialness, transparent, rec¬ 
tangular, translucent, culinary, sector, illumination, intimida¬ 
tion, segment, depredation, physiognomy, egress. 

Plate 2: wagon, dancing, teacher, phonograph, partners, ath¬ 
letes, transport, competition, revelry, terpsichorean, ebullience. 
Plate 3: car, fight, boxing, counter, pump, customer, paying, 
clerk, fuel, sale, sport, purchase, gauge, merchant, competition, 
recreation, petroleum, retaliation, replenishment, pugnacity, 
conveyance, aggressiveness, transaction. 

Plate 4: chimney, park, shrubbery, panels, dwelling, veranda, 
panorama, urban, domicile. 

Plate 5: presents, island, surf, isolation, munificence. 

Plate 6: bird, horse, fly, wagon, transportation, insect, con¬ 
veyance, antiquated. 

Plate 7: race, catching, uniform, sport, discussion, skill, pas¬ 
sion, affection, flight, impact, amour, dialogue, discourse. 

Plate 8: house, clothes, firecracker, basket, music, laundry, 
clean, explosion, sudden, garment, neglect, dehydration, deto¬ 
nation. 

Plate 9: farm, manufacturing, skyscraper, landscape, currency, 
industrial, pecuniary, tranquillity, agrarian. 

Plate 10: chair, cup, spoon, furniture, razor, thermometer, 
steel, refreshment, liquid, mercury, container, grooming, bever¬ 
age, centigrade, tonsorial. 

Plate 11: clock, circle, numbers, locket, engraving, lobe, senti¬ 
ment, appendage, chronometer, pendant. 

Plate 11 : food, meal, afraid, hot, fear, startling, nutrition, 
perspiration, tattered, vagabond, gorging, poverty, glutton, il¬ 
legality, felony, humid, vagrant, coercion, mastication, desti¬ 
tute, gourmand, itinerant, insatiable, repast, corpulence, sudor¬ 
ific, mendicant. 

Plate 13: telephone, accident, crying, cheerful, collision, de¬ 
struction, vehicles, mishap, portrait, transmitter, sympathy, 
propulsion, communication, consolation, condolence, negli¬ 
gence, bereaved, lacrimation, deleterious. 

Plate 14: policeman, safe, uniform, listening, broadcast, danger, 
protection, authority, disaster, gravitation, catastrophe, con¬ 
stabulary, fortuitous. 

Plate 15: bathtub, bed, chair, newspaper, operation, illness, 
anaesthesia, cleanliness, aseptic, crisis, leisure, immersion, re¬ 
cumbent, somnolent, displacement, perusing, supine. 

Plate 16; airplane, train, propellers, locomotive, intersection, 
harbor, aviation, altitude, marine, fuselage, nautical, roadstead. 

Subjects .—The test was administered to 600 white American- 
born subjects ranging from age two to thirty-four years in¬ 
clusive, Table x shows the number of subjects of each sex tested 
at each age level. Numbers are not equal because correct grade 
placement was a primary control, rather than age, with the 



312 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

school children. Thirty were tested at each grade level but n 
18-year-olds were discarded because it did not seem possible to 
obtain a reasonably unbiased sample at this age level. 

Between the ages of 2 and 17 inclusive, the subjects were 
selected by age or grade levels with respect to the fathers’ oc¬ 
cupations in direct proportion to a ten-group socio-economic 
breakdown as presented in the index of gainfully employed 
white males of the 1940 United States census (14). Where 

TABLE I 


Number and Average Chronological Age of Subjects Tested at Each Chronological Age 
Level in the Present Standardization Group 


Age 

level 

Males 

Females 

Total 

N 

Mean age* 

N 

Mean age 

N 

Mean age 

2 

IS 

2.5 

IS 

2 -S 

30 

2.5 

3 

IS 

3-5 

is 

3 ■ S 

3 ° 

3 -S 

4 

IS 

4-5 

is 

4.4 

30 

4-4 

5 

15 

5-4 

IS 

5-4 

3 ° 

S -4 

6 

II 

6.5 

13 

6.6 

24 

6.6 

7 

8 

7.3 

16 

7-4 

24 

7-4 

8 

21 

8.4 

12 

8.5 

33 

8-5 

9 

is 

9-5 

II 

9-5 

26 

9-5 

IO 

17 

lo.s 

1? 

10.5 

36 

10.5 

II 

IO 

ii -4 

l6 

II -s 

26 

II -4 

11 

l6 

11.4 

l6 

12.4 

32 

12.4 

13 

20 

I 3 -S 

14 

13-4 

34 

13-5 

14 

9 

H -5 

12 

14-3 

21 

14.4 

IS 

19 

iS-S 

17 

IS -5 

36 

155 

l6 

II 

16.4 

15 

10,4 

26 

16.4 

17 

l6 

17-4 

15 

17-5 

31 

17. S 

H 

OO 

1 

LO 

60 

25 -3 

60 

24.6 

120 

25.0 

Total. 

.... 293 


296 


589 



* Years. 


numbers per age level were below one subject, age levels were 
combined. 

For the adult group, males and females were separately con¬ 
sidered in direct proportion to the occupational status of white 
males and females between the ages of 18 and 34 as given in the 
census reports (14). The urban sample was obtained in the 
Denver area from private, parochial, and public schools; busi¬ 
ness establishments; amusement parks; and homes. The rural 
sample was secured from rural districts in Colorado and Ne¬ 
braska. More detailed information about the sampling controls 
is given in articles dealing with the subgroups of the standardi¬ 
zation population (3, 5, 6, 7), 




PICTURE VOCABULARY TEST 


3*3 


'Testing. —Preschool-age children were tested in their homes 
or in special rooms at day-care centers; school children were 
brought to rooms provided by the schools for testing; and adults 
were tested in their own homes, in testing rooms of industrial 
firms, in parks, or in a church office. All subjects were given 
an intelligence test and the standardization picture vocabulary 
test of 216 tentative items. The full Stanford-Binet, Form L, 
was given from ages 1 to 5, the Stanford-Binet vocabulary test 
from ages 6 to 17, and the Wechsler vocabulary to adults. 
Standard administration procedures were followed for each 
test (11, 13). The picture vocabulary was given first to all 
groups but the adults. 

Since testing was done by a number of examiners, a detailed 
procedure including a set of instructions was set up. The subject 
was seated opposite the examiner, with plates and recording 
sheet out of sight. The session was started by asking for personal 
information, such as name, age, and occupation of head of 
family or own occupation. The subject was told he was to be 
asked some questions that he could answer by pointing to one 
of the four pictures on a plate. It was explained that some items 
would be too hard for him, and that he should not guess, but 
just say “I don’t know.” Doubtful items were checked by asking 
the subject why he made a certain choice, asking him to define 
the item verbally, or repeating the item later. This seemed to 
discourage guessing almost completely. 

Items were scored right or wrong, and testing proceeded on 
a given plate until three successive items had been failed and 
three successive items passed. This was considered sufficient, 
since the items had been arranged in order of difficulty after 
pretesting, and items beyond the three-consecutive-pass and fail 
levels could reasonably be assumed to be passed or failed. In 
order to maintain rapport, the tester was free to introduce 
easier words at any point. Testing was started on successive 
plates at the subject’s mental level as estimated from responses 
to preceding plates. 

Item selection .—After the 589 subjects had been tested with 
the 1‘ 26-word preliminary scale, an item selection was made. As 
a first step, all correct responses were tabulated by age, sex of 
subjects, and item. Words below the three-consecutive-pass 
level for each individual on each plate were considered as passed 



3I4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

and those above the three-consecutive-fail level as failed. In 
order to make all values of passes comparable, per cent passing 
was calculated from number passing and number theoretically- 
attempting. 

A CA or adult index number was found corresponding to the 
50 per cent passing point for each item for the whole group, by 
interpolation if necessary. For example: 

CA levels 

Word 8 9 10 11 

Per cent passing 

Shrubbery 21 31 56 69 

The point where 50 pet cent would pass lies between CA’s 9 and 
10, actually at 9,8 by interpolation, CA’s were used in calculat¬ 
ing the 50 per cent passing point through age 17, while index 
numbers were assigned to six adult levels set up on the basis of 
Wechsler vocabulary scores. A word with a rating of A1.5 
would have been passed by less than half of the lowest 20 adults 
(Ai) and more than half of the next to lowest 20 adults (A2). A 
word with a rating of A 6.5 would have been passed by less than 
half of the highest 20 adults (A6). Thus, relative difficulties 
were computed for all words in terms of 50 per cent passing 
points and were indicated by CA or adult index number. CA 
17 and A3 were considered to be equal levels, and difficulties can 
be figured from below 2 to A6 in one series on this basis. 

Items were rejected for the following reasons: (a) inadequate 
discrimination in per cent passing between successive age levels, 
(b) regional meaning, (c) sex difference in difficulty, (d) am¬ 
biguity of denotation, (e) same item already used with another 
plate, (f) too many words at a given age level. 

(a) Words were thrown out where nearly the same number 
of subjects passed on several successive age levels, or an item 
was harder for a more advanced group. For example: 

CA levels 

Word 8 9 10 11 12 

Per cent passing 

Gauge - 24 31 61 50 72 

Words eliminated on this basis were: pane, gauge, veranda, 

affection, neglect, landscape, startling, tattered, vagabond, il- 



PICTURE VOCABULARY LEST 


3*5 


legality, vagrant, destitute, transmitted, disaster, aseptic, lei¬ 
sure, recumbent, marine, physiognomy, conveyance, detona¬ 
tion, grooming, and illness. It will be noted in the following list¬ 
ings that several words were rejected for more than one reason. 

(b) The following words were eliminated because of poten¬ 
tially varying difficulty depending on regional experience differ¬ 
ences : urban, grooming, roadstead, partners, petroleum, aseptic, 
aviation, altitude, marine, fuselage. 

(c) A separate tabulation was made of the number of males 
and females above and below the 50 per cent passing point for 
each word. Where there were marked discrepancies in item diffi¬ 
culty between the sexes, a chi-square test (9) was run. Apparent 
differences between the sexes at beyond the one per cent level 
were noted in the case of the following words: detonation, avi¬ 
ation, altitude, fuselage, partners, tattered, vagabond, illegality, 
vagrant, and destitute. These were eliminated, although it is 
realized that such marked sex differences would of course occur 
a number of times by chance in this large a number of words. 

(d) Several words were rejected because they potentially 
referred to two different drawings on the same plate: customer, 
boxing, competition, catching, flight, glutton, illness, partners, 
and merchant. 

(e) “Sport” and “uniform” were tried out on two different 
cards and the better card-word combination on the basis of the 
other criteria was kept. 

(f) It was decided to have 10 words at each level from below 
1 to 5 years, 8 at each level from 6 through 16, and 8 at each 
adult level 3 through 6, or a total of 170 words in the final scale. 
Where there were too few words, as at levels 1, 4, 5, 8, 11, ia, 
14, A3, and A5, words were borrowed from adjacent levels. 
When a minimum number of words had been assigned to each 
level below—2 to 16 and A3 to A6, the surplus words were 
eliminated in the order that they failed to meet the other 
criteria. It should be noted in this connection that several of the 
criteria were only relative and subjectively applicable to begin 
with, and this final process of eliminating on the basis of an 
oversupply of words at a given level led to further qualititive 
differences between the items used with the various plates. 

The final step was to divide the 170 items into two forms 



J 1 6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


equal in length and as equal in difficulty as possible. The words 
were therefore arranged in order of difficulty without respect to 
plate, and assigned in groups of four, the first and fourth of 
each group going to Form A, and the second and third to 
Form B. 


Results 

Following are the 85 words finally chosen for Form A, with 
their difficulty levels indicated: 

Plate 1: pie (1.7), window (1.7), seed (6.5), sill (6.7), trans¬ 
parent (13.3), rectangular (14.7), sector (16.0), illumination 
(16.0), culinary (17.2), egress (A6.3}. 

Plate i\ athletes (8,6), competition (13.0), revelry (A4.0), 
ebullience (A6.4). 

Plate 3: counter (4,0), pump (4.4), clerk (6.4), sport (7.6), 
recreation (10.8), pugnacity (16.9), replenishment (A3.1), re¬ 
taliation (A4.1). 

Plate 4: shrubbery (9.8), dwelling (11.7). 

Plate 5: surf (12.5), isolation (12.9). 

Plate 6: horse (1.5), wagon (2.3), insect (6.7), transportation 

(8.6) , antiquated (A3.8). 

Plate 7: discussion (7.7), skill (10.9), amour (13.8). 

Plate 8: firecracker (2.7), clothes (3.0), explosion (4.9), clean 

(5.5) , dehydration (A4.3). 

Plate 9: farm (4.1), currency (12.2), tranquillity (16.5), agrar¬ 
ian (A6.2). 

Plate 10: furniture (4.4), steel (6.0), refreshment (6.2), liquid 
(7.3), container (9.5), centigrade (14.5). 

Plate 11: clock (1.6), locket (3.0), numbers (3.4), engraving 

tem¬ 
plate 12: hot (5.2), fear 1(7.4), nutrition (10.4), gorging (12.8), 
poverty (13.9), mastication (A2.6), itinerant (A4.5), coercion 
(A4.6), corpulence (A5.5), insatiable (A5.6). 

Plate 13: telephone (2.1), crying (2.9), accident (3.0), vehicles 

(9.5) , destruction (10.0), portrait (10.2), communication (10.6), 
consolation (13.4), negligence (14.3), bereaved (15.4), deleteri¬ 
ous (A6.2). 

Plate 14: danger (5.6). 

Plate 15: bed (1,6), newspaper (2.5), anaesthesia (11.7), immer¬ 
sion (14.6), displacement (A5.0), perusing (A5.0). 

Plate 16: propellers (3.7), harbor (8.1), locomotive (8.2), nau¬ 
tical (16.5). 

The following 85 words were chosen for Form B: 

Plate 1: vegetable (3.8), human (4.4), dessert (4.5), agriculture 
(10.7), anti-socialness (13.2), segment (15.0), intimidation 

(16.6) , translucent (A2.5), depredation (A4.0). 

Plate 2: phonograph (3.3), transport (8.4), terpsichorean (A6.0). 
Plate 3: car (1.6), fight (2.8), paying (6.0), customer (6.3), 
fuel (7.5), sale (7.9), purchase (10.4), transaction (14-6), ag¬ 
gressiveness (A3.6). 



PICTURE VOCABULARY TEST 


3 T 7 


Plate 4: panels (13.9), domicile (A4.0). 

Plate 5: island (5.3), munificence (A5.7). 

Plate 6: bird (1.6), fly (2..5), conveyance (14.5). 

Plate 7: passion (12..5), impact (13.5), dialogue (13.6), discourse 

(A4.5). 

Plate 8: music (3.0), laundry (4.7), sudden (9.1), garment (9.8). 
Plate 9: manufacturing (7.2), skyscraper (7.8), industrial 
(10.0), pecuniary (A4.9). 

Plate io: spoon (1.8), razor (3.0), thermometer (4.1), mercury 

(10.7) , beverage (10.9), tonsorial (A4.4). 

Plate 11: circle (2.7), sentiment (13.9), lobe (15.5), chronometer 

(15.7) , pendant (17.7). 

Plate ia: meal (3.9), perspiration (9.6), humid (14.7), felony 

(16.7) , gourmand (A4.6), repast (A5.2), mendicant (A6.3). 

Plate 13: cheerful (6.8), collision (7.4), sympathy (9.6), 
mishap (n.i), propulsion (13.3), condolence (16.2), lacrimation 
(A 6 . 3 ). 

Plate 14: policeman (2.5), listening (£.3), broadcast (5.9), uni¬ 
form (6.2), safe (6.5), protection (6.7), authority (10.4), grav¬ 
itation (11.8), catastrophe (12.0), constabulary (A3.2), fortui¬ 
tous (A6.4), 

Plate 15: bathtub (1.6), operation (3.1), cleanliness (8.7), crisis 
(12.5), somnolent (16.2), supine (A5.5). 

Plate 16: train (1.5), airplane (i.8), intersection (8,5). 

The point levels given with the words should be considered 
only as indices of difficulty, since actual average ages within 
age groups were not used in their calculation. The average level 
of Form A is 10.7 and that of Form B is 10.5. It can be seen that 
the forms are closely comparable in difficulty for the whole 
group. 

Rough analyses of the incidence of parts of speech and of 
content areas were made for both forms combined. There are 18 
words which are direct derivatives of relatively common verbs, 
125 nouns, and 27 adjectives. Designating content areas arbi¬ 
trarily, there are 30 words of home or domestic import, 38 
referring to nature or science, 60 relating to social processes, 11 
commercial, 14 personal feelings, and 17 not readily classifiable 
in this scheme. It would seem that the test puts a premium on 
the knowledge of names referring to society and social ac¬ 
tivities. 


Discussion 

To the extent that the occupational groups in the Denver area 
and a small rural area in Nebraska are typical of those in the 
United States as a whole, norms from this test can be considered 



318 educational and psychological measurement 

to be representative. There is, of course, some bias, as in all 
results based on controlled samples, but the sample is controlled 
at least as adequately as Wechsler’s, if not more so. In any case 
it provides an excellent basis for item selection. 

The words finally chosen cover the range of verbal ability 
thoroughly, and discriminate well between ability levels as 
found in different age groups. Later papers show that the two 
test forms made up of these words intercorrelate highly, and 
correlate well with other intelligence tests. The approximate age 
placement of items is only intended to facilitate the testing 
mechanically, as the test is actually a point scale. Norms for a 
general white population (3, 6, 7) and for certain population 
subgroups (a, 5) will be given for both forms in later papers. 

From a practical point of view the promise of the test is 
well borne out. Proficient testers were able to test three or four 
children an hour with both the picture vocabulary test and the 
1937 Stanford-Binet or the Wechsler vocabulary test. A high 
interest level was in evidence on the part of most of the testees. 
It seems from the above that it has been possible to construct a 
vocabulary test satisfactory for testing persons unable to speak 
or verbalize well. 


Summary 

Ammons and Huth (4) showed that it was possible to con¬ 
struct a picture vocabulary test of high reliability and validity. 
The present paper reports the procedure whereby items for such 
a test based on their 16 plates and covering the age levels from 
1 to 34 were obtained and validated. The general procedure was 
as follows: 

1. A set of 243 new items appropriate to the plates was 
listed and 48 of Ammons and Huth’s final items were retained. 

2. Of these 291 items 43 were eliminated by group discussion. 
A preliminary validation check was made on the remaining 248 
words, and 226 were retained. 

3. These 226 items were used to test 589 white American- 
born subjects ranging in age from 2 to 34 years. The sample was 
controlled by age levels for parents' occupation or own occupa¬ 
tion, age-grade placement in school, and sex. 

4. On the basis of this standardization testing, 56 items 



PICTURE VOCABULARY LEST 


319 


were eliminated because of regional bias, failure to discriminate 
between successive age levels, too many items at a level, sex 
differences, ambiguity of picture denotation, or duplication of 
words on different cards, 

5. The remaining 170 items were divided into two equal- 
length forms which were found to be almost identical in diffi¬ 
culty. 


REFERENCES 


1. Ammons, R. B. and Ammons, Helen S. The Full-Range Picture 

Vocabulary Test. New Orleans: R. B. Ammons, 1948. 

2. Ammons, R. B. and Agiiero, A. “The Full-Range Picture Vocabu¬ 

lary Test: VII. Results for a Spanish-American School-age 
Population.” ’Journal of Social Psychology. 

3. Ammons, R. B. and Holmes, J. C. “The Full-Range Picture 

Vocabulary Test: III. Results for a Preschool-age Popula¬ 
tion.” Child. Development , XX (1949), 5-14. 

4. Ammons, R. B. and Huth, R. W. “The Full-Range Picture Vo¬ 

cabulary Test: I. Preliminary Scale.” Journal of Psychology, 

XXVIII (1949), H-64. 

5. Ammons, R. B. and Manahan, N. “The Full-Range Picture Vo¬ 

cabulary Test: VI. Results for a Rural Population.” 
Journal of Educational Research, to be printed. 

6 . Ammons, R. B., Arnold, P. R. and Herrmann, R. S. “The Full- 

Range Picture Vocabulary Test: IV. Results for a School 
Population.” Journal of Clinical Psychology, VI (1950), 
164-169. 

7. Ammons, R. B,, Larson, W. L. and Shearn, C. R. “The Full- 

Range Picture Vocabulary Test: V. Results for an Adult 
Population.” Journal of Consulting Psychology, XIV (195°), 


8. Atwell, C. R. and Wells, F. L. “Wide Range Multiple Choice 
Vocabulary Tests.” Journal of Applied Psychology, XXI 


(W 37 ). 550 - 555 - 

9. Lindquist, E. F. Statistical Analysis in Educational Research. 


Boston: Houghton Mifflin Co., 1940. 


10. Seashore, R. H. and Eckerson, Lois D, “The Measurement of 
Individual Differences in General English Vocabularies.” 


Journal of Educational Psychology , XXXI (1940), 14-38. 

11. Terman, L. M. and Merrill, Maude A. Measuring Intelligence. 

Boston: Houghton Mifflin Co., 1 937 - 

12. Van Alstyne, Dorothy. Van Alstyne Picture Vocabulary Test for 

Pre-school Children. Bloomington, Ill.: Public School Publ. 


Co., 192.9. 

13. Wechsler, D, The Measurement of Adult Intelligence. (3rd Ed.) 

Baltimore: Williams and Wilkins, 1944. 

14. Sixteenth Census of the United States, 1940, Population. U. S. 

Bureau of Census, U. S. Govt. Printing Office, 1940, et seq. 



DOES FACE VALIDITY EXIST?* 


SIDNEY ADAMS 
U, S. Civil Service Commission 

Face validity, in this paper, has the meaning of “appearance 
of validity” in the language of Mosier. Mosier (6, p. 192) 
says: 

In this usage, the term 'face validity’ implies that a test 
which is to be used in a practical situation should .. . appeal- 
practical, pertinent, and related to the purpose of the test . . . 
it should not only be valid, but it should also appear valid. 
This ... is not validity in any usual sense . . . [but is] an ad¬ 
ditional attribute of the test which is highly desirable in certain 
situations. 

This paper attempts a measurement of face validity by 
having a group of Federal government workers judge the ex¬ 
tent to which seven tests possessed true validity. The analysis 
of the results attempts to answer two questions: 

(1) Does face validity exist in a form that can be reliably 
measured? 

(2) What relationship does face validity bear to true validity? 
Is a test with the appearance of validity likely to be one with 
actual validity? 

Partial answers to both questions are to be found. Dr. Thelma 
Hunt had done unpublished research on the guessed validity of 
general psychology examinations and its relationship to true 
validity. Smith (8) had students evaluate seven types of ex¬ 
aminations (essay, true-false, etc., as to their suitability for 
determining the grade in an education course. On the average, 
each student’s rank of the validity of the test types correlated 
+ .31 with any other student's estimate of validity. This in¬ 
dicates that under the conditions of the experiment, face valid¬ 
ity of test-type, with the same subject-matter for all test 
types, is measurable, but not very reliably’ measurable, 

1 A discussion of the relationship between Face Validity and True Validity for the 
members of a group who tried out an experimental test battery. 

310 



DOES FACE VALIDITY EXIST? 3^1 

The literature concurs in finding face validity a poor indica¬ 
tor of true validity. This has been pointed out by Mandell (4) 
and Mosier (6). O’Rourke (7) had judgments made on pro¬ 
posed tests for the postal service. He demonstrated that one 
test with great face validity possessed little or no true validity 
in a tryout and statistical validation which followed the judg¬ 
ments. 

The subjects for this study were 39 members of the Personnel 
Department of the United States Veterans Administration. 
Their salary grades ranged from CAF 5 to CAF 12 ($2634.80 
to $5905.20). It is probable that all members of the group 
possessed considerable knowledge of test methods. 

The individuals participating in the study took eleven tests 
during two half-day sessions. To reduce interference with work, 
one session for each group of approximately 20 persons was 
held on one day, followed by a second session on the next day. 
During the second session of each of the two groups, testing 
was suspended. Each individual in the group was asked to 
rank the first seven of the tests, on the basis of their desir¬ 
ability for use in the selection and promotion of people for 
personnel jobs of the kind and level held by members of the 
group. Each examinee was told to write, on a sheet of paper, 
a code number which was used as his designation. These num¬ 
bers provided anonymous identification throughout the study. 
The examinees were then told to write, in the time-order in 
which they had been taken, the names of the first seven tests 
of the series. These were, in order: 

Administrative Judgment Test 
Interpretation of Data ( Graphs) Test 
Vocabulary Test 
English Expression Test 
Contemporary History Test 
Personality Estimates Test 
Word Identification Test 

The Administrative Judgment Test presents, for each ques¬ 
tion, a situation or problem in business or government organ¬ 
ization or procedure. Five solutions are offered for each prob¬ 
lem. The examinee is asked to choose the best of the five. The 
Interpretation of Data Test requires the examinee to read and 



q22 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

interpret graphs and tables which show economic and social 
trends. The content of the third and fourth tests, Vocabulary 
and English Expression, is more or less self-explanatory. The 
Contemporary History Test is a factual examination on national 
and world events between 1915 and 1948. It had more “back¬ 
ground” questions and fewer straight news questions than do 
most current events tests. The introduction to the Personality 
Estimates Test describes the personality traits of five individ¬ 
uals. Each question then describes a certain action or states a 
certain opinion. The examinee was asked to indicate which of 
the five imaginary individuals would most probably have taken 
the action or held the opinion. The Word Identification Test 
was a type of vocabulary test in which the examinee was re¬ 
quired to identify a particular word needed to complete a 
sentence. The initial letter of the word, and the number of 
letters in the word, were given. 

The examiner described each test briefly, in order to recall 
all tests to the examinees. 2 At the time of the rating of the tests, 
a show of hands indicated that all examinees had reached the 
sixth test, Personality Estimates. Those who had not reached 
the Word Identification Test were asked to look at the sample 
questions for this test. The various tests were not separately 
timed, hence, at the time of the rating of the tests, the ex¬ 
aminees had reached different tests in the battery. 

The examinees were asked to consider which one of the 
seven tests was the best for selection for, or promotion to, 
personnel jobs in the Veterans Administration, or similar per¬ 
sonnel jobs. The tests were to be ranked according to their 
present state; no allowance was to be made for possible improve¬ 
ments in the tests. The best test in each list was marked “1 
best”, the next best as “2”, and so on to “7” for the poorest 
test. Tied ratings were to be reconsidered; the examinee was 
to break the tie arbitrarily if tests appeared tied after recon¬ 
sideration. 3 Examinees were cautioned that “face validity" 

2 The first seven, rather than all eleven, tests were used. This was done to allow the 
tests to be rated at a convenient time in the schedule. Also, there would be probably 
considerable confusion among the judges in comparing, by recall, as many as eleven 
tests. 

3 The examiner assigned a “4’' or average rating to one omitted test. Two tests 
remained tied, presumably after reconsideration by the rater. The examiner broke 
the tie by tossing a coin. 



DOES FACE VALIDITY EXIST? 


3*3 


was not the major consideration in the selection of a good test; 
that a test might sometimes be a poor selective or promotional 
aid in spite of apparent validity. 

The distribution of the ranks assigned each test is shown in 

TABLE i 


Frequency Distribution , Mean and Variability oj the Rank in Face Validity of Seven 
Tests by Veterans Administration Personnel Workers 



1 

2 

Rank 

3 

4 

5 

6 

7 

M 

O' 

Administrative Judgment. 

.t9 

7 

6 

4 

I 

I 

I 

2.18 

I.$9 

Interpretation of Data. 


14 

13 

2 

O 

s 

3 

3.28 

1.71 

Vocabulary. 

. I 

I 

4 

8 

12 

9 

4 

4.85 

1-37 

English Expression. 

. 4 

I 

7 

14 

4 

7 

2 

4.08 

I.SB 

Contemporary History. 

.3 

3 

4 

3 

7 

9 

IO 

4.92 

1.92 

Personality Estimates. 

.IO 

IO 

2 

O 

4 

I 

12 

3-74 

2-51 

Word Identification. 

.O 

3 

3 

8 

II 

7 

7 

4-95 

i-4 S 


TABLE a 

Horst’s Reliability for Rated Face Validity of Tests 



1 a 

D 

C 

d 

e 

f 

s 

h 

n 

XX 

2 X» 

. I 

n 

(?) 


. g 
(n- 1 ) 

Admin. 

39 


275 





.0605 

Interp. 

39 


534 

3.28 

13.69 

10.76 

9-9 3 

.0771 

Vocab. 

39 

189 

989 


25.36 

23.52 

no 

.0484 

English. 

m 

159 

745 

4.08 

19.IO 



.0645 

Contemp. 

39 

192 

1088 

4.92 


24. 21 

3-6 9 

.0971 

Personal.. 

i 



3-74 

20.31 

13-99 

6.32 

.1663 

Word Id. 

39 

193 

1037 

4-95 

26.59 

24.50 

2.09 

■0550 

Sum.. 


1092 

5460 

28.00 

I40.OO 

118.38 

21.62 

.5689 





A 


B 


c 


Table i. The mean and standard deviation of the rank assigned 
each test is also shown in this table. 

In terms of Horst’s (2) formula for reliability, the reliability 
of face validity ratings amounts to .911. This is shown in 
Table 2, which is arranged according to Horst’s work sheet for 
his formula. Thus, it appears that the measure of face validity 
used does, have reliability. 



























































324 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(n = no. of ratings per test) (X = raw ratings) 
(N = no. of tests) 


r = 



r = i 


_^89_ = 

118.38 — 784 y 

~T 


TABLE 3 

Variance Analysis of Ratings of Tests 


& 

(2) 

Variance of 
rank of test 

„ . .(3) B 

Deviation of mean 
rank of test from 
4> the mean rating 
(rank) of all tests 

(4) 

Square of 
Column (3) 

Administrative Judgment.. . 

Interpretation of Data. 

Vocabulary.*. 

English Expression. 

Contemporary Affairs. 

Personality Estimates. 

Word Identification. 

2.2993 

2,9204 

1.8740 

а. 418 
3.6609 

б. 293s 
2.0994 

-1.82 

-.72 

+.85 

4- .08 

+.91 

— .2,6 

+-95 

- 

3 - 3 12 

0.5x8 

0.722 

0.006 

0.846 

0.068 

0.092 


21.6288 

0.00 

6.374 


X39 


X39 

Within—Variance 

8+3-5432 

Between Variance 248.586 

2.66 

3 -i 7 i 

■ 4 - 6 

4 1 ■ 43 1 

Natural log of quotient 


Natural log 

□f quotient = 



3.72469 



3.72469 - 1.15426 = 2.57043 


1.15426 

4 - 2 = 1.28522, (2) 


TABLE 4 

Rank Correlations ielween Rated Estimates of Test Validity for Random Pairs of Examinee 

—Raters 


Pair 

P 

op 

Pair 

P 

op 

Pair 

P 

op 

I 

+ -2J0 

0.356 

7 

-.107 

0.37+ 

14 

+ .036 

0.377 

2 

-.286 

0-349 

8 

-.179 

0.367 

IS 

-■643 

0.230 

3 

+ •393 

0.324 

9 

+ -7JO 

0.174 

l6 

■4“. 286 

0.349 

4 

— .071 

0.376 

10 

+ -7H 

0.194 

17 

+.071 

0.376 

s 

+ ■321 

0.342 

II 

— .214 

0.362 

18 

— .071 

0.376 

6 

■4.608 

0.246 

12 

13 

4.607 
+ .518 

0.247 

0.284 

19 

+■393 

0,324 


This was confirmed by a variance analysis following the 
method of Mills (5) which showed the between-variance greater 
than the within-variance at a probability within the one per 
cent level, z was equal to 1.28. These calculations are shown 
in Table 3. It thus appears that face validity is a definite 
entity, whether or not face validity is related to true validity. 












DOES FACE VALIDITY EXIST? 


325 


Some tests do appear to this group of examinees to be more 
valid than others. 

A different approach to this problem is by determining 
whether face validity shows any reliability. This is demonstrated 
by showing the rank correlations between raters. Of the 39 
raters, 38 were paired in random pairs. The final digits of 
logarithms in corresponding positions on successive pages in a 
logarithmic table were used to determine the pairing of the 
individuals, each of whom had a code number. Rank correla¬ 
tions (p) were determined for each pair. These correlations 
are shown in Table 4. For computation of the standard error 
of p, see (1, p. 123. 


TABLE 5 


Administrative Judgment. 
Interpretation of Data. .. 

Vocabulary. 

English Expression. 

Contemporary History... 

Personality Estimates_ 

Word Identification. 



The mean p amounts to +.178, and the median to +.250. 
Thus the relationship between the ranks of the tests by any two 
individuals, while very small, is positive. A further measure of 
the interrelationship of the pairs can be obtained by the use of 
the average intercorrelation formula. The use of this formula 
gives an answer to this question: Do the rank correlations com¬ 
puted in Table 4 appear representative of all the possible 
correlations which could be computed between the ranks of 
tests for different pairing combinations of subjects? A total 
of 741 correlations would be possible, The mean of these corre¬ 
lations has been computed by the average intercorrelation 
formula, as used by Smith (8), and explained by Kelley (3). 
The formula used was— 

a(\N + 2) 12ES 2 

ni ~ 1 (« - i)(N — 1) + 7 {a - 1 )N{N i - 1) 















326 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

In this study, a is 39, the number of subjects. N is 7, the num¬ 
ber of ranks, which is equal to the number of tests. S is the 
sum for each test, of the square of each rank, times the num¬ 
ber of times the test was assigned to that rank. The computa¬ 
tion of XS 2 is shown in Table 5. The value of r u is +.33. This 
agrees fairly well with the observed values of the 19 rank cor¬ 
relations. Thus, by the use of both variance analysis and cor¬ 
relation, the measurable existence of face validity has been 
shown. 

What is the relationship of face validity to actual validity? 
The true validity of the tests was measured by correlating the 

TABLE 6 

Relationship oj True Validity to Face Validity 


(U 

Teat 


G> 

Face 

Validity 


(3) 

Rank 

ToradS Col. (2) Col. (3) 
(Criterion) 


Rank 


(4) 

True validity 

(others Rank 
averaged Col. (4) 
judgment 
criterion) 


Administrative Judgment. 

a. 18 

• J 2 

I 

I 

■ 5 ° 

2 

Interpretation of Data. 

3.28 

.l6 

2 

J 

■ 3 1 


Vocabulary. 

4-85 

.26 

5 

3 

■ 5 2 

I 

English Expression. 

4.08 

■13 

4 

6 

.18 

7 

Contemporary History. 

4.92 

.08 

6 

7 

■ 2 S 

6 

Personality Estimates. 

3-74 

.32 

3 

2 

.42 

3 

Word Identification. 

4-95 

.22 

7 

4 ■ 

•31 

4 i 


p = -h.50—£rade criterion. 
p = + .31—judgment criterion. 


test scores with the average rating of each participant. The 
ratings of each participant were made by other participants, 
who claimed knowledge of his work. Another measure of true 
validity used as a criterion was the civil service grade of the 
participants. 

In Table 6 the face validity and both kinds of true validity 
are shown for each of the seven tests. The rank of the test 
in each of these characteristics is shown. The rank correlation 
of the face validity is +.31 with true validity determined with 
a judgment criterion. It is +.50 with true validity computed 
against a salary-grade criterion. 

In Table 7 it is seen that the relationship of the true validity 
qf a test to its validity estimated by one individual is very 











TABLE 7 

Rank Correlations between Face Validity and True Viliiity for Each Participant in the Test Tryout 


DOES FACE VALIDITY EXIST? 


3 a 7 


i>, 



JO 


M3 c< 4 - vo vn 4- t"- 4- O 



4 - 4 - vo i—vo t~~~ vo i—vo r— t—. r— t—. 

c-i n m m n ch h n hh c< co c"} co 

EJ 


a 



i 



s 

a 

CJ 

O-VO VO VO 4-VO VO t-f O MD VO CO 

3 

qcoco noo h- ‘o n m o 4- 

vo c-l t~~v^ifSt-nr^O vo O hh hh 


.b 

-3 


r i* r r r ?* i* i* ^ »■ (■ .■ .■ 






£ 



u 

8 

» qo CA f- -r}- 4 -VO VO OAVO Vi Vi 


4~ r- Vr , t-H c~^ hh o vo o t— co**o 

H 

cncort nnnnnnnnri to 

o. 



13 

if 

. 


OJ 

vo O'- O O » Q vo hh ^)- b' IT) f- L- 


pH 


'43 

H=t 

N O *-o 4- O -f- d c*"> 4" O VO h 

.2! 

n> 

p* 

y^i 

4 1 4-4- 1 44444 1 +4 




S'c 

9 

l>-c© Oa O *-* <H CO -H- vr,vO Ca-oo Oa 

Ph 

a 

<r< a ch nnonnonc^nn 

£ 


c4 c4 ^l-icc a) c4 Q h vo vo r- 

i—i c4 r^-vo r^oo m cl 4' hh vo r— 4 

-O 




n M CH hh d <4 COCOtOtOCOCO 





u 

T3 


a 




o 

oo rnvo O vo n Oa rovo oa Oa h vo 



cs r*~ c^co cb o-. r^-oo 

& 

=a 


■4^3 Vo o~ O vnvo ort O c< 

1444144441414 



£ 





O DO ClO -4-GO CO CO DO f-'ot-* HH VO 

8 

n 4 -+h r— f~— i—i ch r— co 

s 

3 

B 

fj co cf co c? cf co n vo n co 

CL 



HI 




VO vo vo, O Vo t— vo t O 4 4- 4-oo 



vo OaOacm CJA cH Ov cS d (M n nf 

♦3 

_a 

n«S(S4isopto4nHHn 



1444144141414 




S'o 9 

4* voVO r-cc CA Q m rl CO 4- vovO 

Hi-iiHMHHclddclddd 

45'-a & 

>> 





4* co *-i to hh HH ovo OaV© t^- O vo 



vo 4 hh co hh vo cs vo co jv [—eo 

a 


conrfnronnnhnnnH 

1 

■u 

■g 


a 

£ 

VO HH CJA r— OA 4 n CA >H 1—1 VO co-4- 

§ 


CO ri Ci Vo cM .HH o M o-j -»r)- hh 



<H n 4 n 4 Cl C)H (OO O *-1 r^. 



4144414141144 



!> 



« 

2 

V 

vo VO VO OO OAtA O C^> HH 4-r—-4 -Os 

ia o\ [a. r-AVo hh »-o t— r~-vo r^- --o 

H 

I 

i 

co cm n co n nci nt^concno 

s 


-B 1 






a 

o 

'43 

V 

c^ ( 4 j <s Qa fN d oo vo 4-oo O oo o 

-a 

vo [a\o Q vo o r~_ hh ro o\ h- oaoo 

O 4-0 O HH 4^0 H HH o M o Vo 


(2, 

4 ! 44414 14 1 444 


hh cS c-y 4- vnVO t^cQ o o w d 

<2 a a 

HH HH HH HH 


Hk 









32,8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


small and undependable. The numbers used in Table 7 are 
not the code numbers used in the examination. 

Conclusions 

1. Face validity appears to exist, at least for the tests, sub¬ 
jects and conditions described in this paper. Examinees, ex¬ 
posed to several tests, agreed with measureable consistency 
that some of the tests appeared more valid than others. 

2. Wide differences often exist between the judgments made 
by different individuals as to which tests possess face validity. 

REFERENCES 

1. Dunlap, Jack W. and Kurtz, Albert K. Handbook of Statistical 

Nomographs, Tables and Formulas. Yonkers-on-the Hudson: 
World Book Company, 1932. 

2. Horst, Paul. “A Generalized Expression for the Reliability of 

Measures.” Psychometrika , (1949), 14, 21-32. 

3. Kelley, Truman L. Statistical Methods. New York: Macmillan 

Company, 1924, 

4. Mandell, Milton M. "Facts and Fallacies of Personnel Testing.” 

Personnel, XXIV (1947), 

5. Mills, Frederick C. Statistical Methods. New York: Henry Holt 

and Co., 1938. 

6. Mosier, Charles I. "A Critical Examination of the Concept of 

Face Validity,” Educational and Psychological Meas¬ 
urement, VII (1947), 191-206. 

7. O’Rourke, L, J, "Saving Dollars and Energy by Personnel Re¬ 

search and Investigation in the Interest of the Postal Ser¬ 
vice.” Journal of Personnel Research , IV (1926), 351-364, 
433 - 450 - 

8. Smith, Francis T. "The Relationship Between Objectivity and 

Validity in the Arrangement of Items in Rank Order.” 
Journal of Applied Psychology , XX (1936), 154-160. 



ADMINISTRATION OF THE PURDUE PEGBOARD 
TEST TO BLIND INDIVIDUALS 


JAMES W. CURTIS 

Illinois Division of Vocational Rehabilitation 

Aptitude testing of the blind involves certain difficulties 
not always encountered in connection with normal individuals 
or individuals with other types of physical handicaps. Perhaps 
the two principal difficulties are the limitations in potential 
vocational placements, and the limitations in available testing 
instruments. Although the past decade has witnessed increas¬ 
ing attention to the development of suitable instruments, a 
substantial number of aptitude factors still present relatively 
difficult problems of determination, as applied to blind persons. 

In successful rehabilitation and job placement, the necessity 
for careful evaluation increases in direct proportion to the 
severity of the handicap. Improvisation often becomes a neces¬ 
sary part of the repertoire of the psychological tester, particu¬ 
larly in those instances in which blindness is the handicap. 

It was noted by the author, on numerous occasions, that job 
placement of blind individuals by the Illinois Division of Voca¬ 
tional Rehabilitation involved an element of finger-hand dex¬ 
terity not satisfactorily measured by commonly used adapta¬ 
tions of standard manipulative and dexterity tests such as the 
Pennsylvania Bi-Manual Work Sample and the Minnesota Rate 
of Manipulation 'Test. After some trial-and-error investigation, 
it was determined that the Purdue Pegboard Test could be used, 
with very little special adjustment, in a quite satisfactory 
manner with blind individuals. It was found, moreover, that 
the results so obtained provided a significant addition to the 
results obtained from other manipulation and dexterity tests, 
in standard use with the blind, such as the two mentioned 
above. 

The utility of any “standard test,” in conditions of special¬ 
ized use, is in inverse proportion to the complexity of the special 

3*9 



330 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

adjustments necessary for such use. At the same time, the 
fewer the necessary adjustments, the greater will be the ad¬ 
herence to the original standardized conditions and, conse¬ 
quently, the more significant will be the results from the stand¬ 
point of the original test purpose. Fortunately, the Purdue 

TABLE i 

Purdue Pegboard Test 
Norms for the Blind 
t N- 7 o ) 


Percentile 

Insertion 

Assembly 

99 

40 

38 

9 b 

39 

36 

90 

38 

34 

80 

3 + 

32 

70 

31 

30 

60 

29 

28 

s ° 

26 

26 

40 

2* 

2 s 

3 ° 

23 

23 

20 

21 

21 

IO 

17 

18 

5 

H 

14 

I 

4 

2 


Pegboard Pest may be administered to the blind with only the 
following deviations from standard instructions: 

a. As the examiner introduces the test, he assists the subject 
in manually examining the board, locating the cups, exam¬ 
ining the pins, sleeves and washers, and identifying the rows 
of holes. 

b. At the start of each sequence, the tester places one pin (or 
one assembly) in the first hole of the row of holes to be used. 

In the two-hand sequence the operator places a pin in the 
first hole of both rows. The pin or pins so placed do not count 
in scoring but serve as orientation points for the blind sub¬ 
ject. No additional deviation from the original instructions 
is necessary. It is desirable, however, to have the subject re¬ 
examine the sleeves and washers before proceeding with 
the assembly section. 

Up to the present time, 70 blind subjects have been tested 
by the Purdue Pegboard Pest in conformity with the instruc¬ 
tions outlined in the above paragraph. The age range was 18 
to 44 years, with the distribution of ages approximating a bell 
curve. There were 45 male subjects and 25 females. The IQ 
range was 89 to 130, with the average, 107. Each of the 70 






PURDUE PEGBOARD TEST 


33 1 


subjects had voluntarily contacted the Division of Vocational 
Rehabilitation for rehabilitation. The 70 were tested in turn, 
according to their date of application for services. Other than 
this, no selective factors were in operation. Norms obtained 
from these 70 cases are presented in Table 1, in terms of per¬ 
centiles. 

The scores included in Table 1, under the designation “in¬ 
sertion,” represent the total number of pins inserted by right 
hand, by left hand, and by both hands, for one trial. The scores 
designated as “assembly” represent the total number of pieces 
assembled in one trial, on the section of the test designated 
“assembly” by the publisher. 

A study of the insertion scores in Table 1, using the 1948 
Purdue Pegboard Profile Sheet for comparison, will show that 
the 99th percentile (blind norms) is equivalent to the apth 
percentile, for industrial applicants. The 50th percentile (blind 
norms) is below the first percentile level, for industrial appli¬ 
cants. A comparison of the assembly section norms of Table 
1 with the 1948 Profile Sheet, shows that the 99th percentile 
(blind norms) is equivalent to the 80th percentile, and that the 
50th percentile (blind norms) is equivalent to the 15th per¬ 
centile. 

Although an insufficient period of time has elapsed to permit 
a statistically reliable validation of the norms for the blind, 
on the basis of achievement in training or employment involv¬ 
ing finger-hand dexterity, preliminary results have indicated 
the strong advisability of utilizing such data as a part of the 
vocational testing complex. 

Summary 

The Purdue Pegboard Pest was administered to 70 blind in¬ 
dividuals, subject only to minor modifications in administra¬ 
tive technique. Tentative norms, based on these administra¬ 
tions, were determined in terms of percentiles. Incomplete 
results suggest a significant level of utility for measurements 
obtained by this technique, in vocational guidance and place¬ 
ment of blind individuals, 



EVALUATING PSYCHOMETRIC PROFICIENCY 

FRANK M. du MAS 
American Council on Education 

Introduction 

Individuals who have the responsibility of training applied 
psychologists are often faced with the problem of evaluating 
the ability of their proteges to administer individual tests. 
There are two considerations involved. First, the evaluation of 
the student as compared to other students, Second, the evalua¬ 
tion of the student as compared to a professional standard of 
competency. Because of the guild-type training received, the 
evaluation may be highly subjective. It would seem, therefore, 
that an objective method of evaluating psychometric profi¬ 
ciency would serve as a useful supplement to the generalized 
subjective evaluation of the supervising clinician. 

Time is important to the busy clinician. The rationalization 
of the two procedures that follow was made with this con¬ 
stantly in mind. The problem may be stated thus: Can an 
objective procedure be worked out which the supervising cli¬ 
nician can apply routinely in appraising psychometric profi¬ 
ciency? Of the two tests that follow, the first can be made in a 
minute or so and the second should seldom require more than 
three or four minutes. 

Analysis of the Standard Error of Measurement 

The square of the standard error of measurement, ox 
may be regarded as the variable 1 error variance of a test score. 
This variance has two components: the variance due to the 
psychometrician, o- p 2 , and the variance not due to the psycho¬ 
metrician, o- n p\ as 

tn«3 2 = Up 2 -f- tr np 2 . (i) 

1 Errors may be classified as either variable or systematic. The present author would 
like to point out that this paper does not evaluate systematic error. The present 
method, therefore, is applicable only when an evaluation of variable error is desired. 
The method suggested in this paper is meaningless when only systematic error is 
present. 


332 



evaluating psychometric proficiency 333 

The variance <r p 2 may be regarded as composed of two com¬ 
ponents also: the variance due to the psychometrician himself, 
o-ph 2 , and the variance due to the situation in which the test 
is administered, c a 2 . But since the trained psychometrician is 
responsible for giving the test under specified conditions, we 
may write 

V = o-ph 2 -f- a„ 2 . (a) 

The variance <r nP 2 may be regarded as having two com¬ 
ponents: the variance due to the testee, at 2 , and the variance 
due to the test instrument, a-, 2 . That is, 

oV = fft 2 + ffi 2 . (3) 

It is obvious, however, that <x? is usually infinitesimal when 
the same test is used. When a parallel form is used interchange¬ 
ably ad may increase but usually only slightly. Since m 2 —> o 
we may disregard this quantity and write 

<Tnp 2 = <Tt 2 . (4) 

It follows that 

ffl OT 2 = cr p 2 -f Ot 2 . (5) 

It is obvious that an unskilled psychometrician should be 
less reliable than a skilled psychometrician, i.e., the square of 
the standard error of measurement derived from test scores 
obtained by an unskilled psychometrician, a**„ 2 , should be 
larger than the square of the standard error of measurement 
derived from test scores obtained by a skilled psychometrician, 

(Tub 2 , aS 

<Tx X q 2 > (Txxa 2 - (6) 

Since (6) would be true even if both the skilled and unskilled 
psychometrician used the same testees, and the same test in 
the same situation, it follows that (6) is due to the fact that 
the variance due to an unskilled psychometrician, a 2 u P , should 
be larger than the variance due to a skilled psychometrician, 

< r ap 2 - 

We should expect, therefore, a**,, 2 > because 

<r up 2 > r ap 2 . ( 7 ) 

Now, the psychometricians who standardized a particular 
test may be regarded as expert or skilled psychometricians. 



334 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Therefore, we may substitute the square of the standard error 
of measurement as published in the standardization data, a** 2 , 
for the variance in relation (6) as 

fed > (8) 

The quantity a-**, which is the published standard error of 
measurement for a particular test, will be used in the proce¬ 
dures that follow as the standard error of measurement desired 
from a skilled psychometrician. The square of the quantity, 
cr xl; 2 , will be regarded as the variance of a population of meas¬ 
ures obtained by a skilled psychometrician on a single indi¬ 
vidual. It follows that the degrees of freedom for cr IX 2 will be 

CO . 

Criterion of Psychometric Proficiency I 
Procedure .— 

a) Regard the first test score, Si, obtained from a testee by a 
psychometrician as the mean of a population of such meas¬ 
ures. 

b) Regard the second test score, S 2 , obtained by the psycho¬ 
metrician from the testee as a deviation from the mean. 

c) Regard as the standard deviation of a normally dis¬ 
tributed population of such measures. 

d) Criterion of psychometric proficiency, I, is attained when 
the second test score does not deviate significantly from 
the first test score, i.e., when the null hypothesis is accept¬ 
able. 

e) Test the null hypothesis by applying the following formula 



where x = deviation from the mean in terms of sigma as 
the unit. 

f) Enter the normal probability table with x and obtain the 
probability area, A, lying between this deviate value and 
the mean. Multiply this area by i and subtract this product 
from i. If the decimal place in the remainder be moved two 
places to the right we then have the level of confidence at 
which the null hypothesis may be rejected. These operations 
may be summarized as follows: 

L. C. = ioo(i — iA). (io) 

Evaluation: The assumption given in (a) above is implicit in all 
test scores obtained in the clinic. It is the rule rather than the 
exception that only one test score of a kind is obtained from 



EVALUATING PSYCHOMETRIC PROFICIENCY 335 

the testee and this score is considered as an estimate of the 
mean of a population of such measures. 

The level of confidence at which the null hypothesis may be 
rejected is set by the supervising clinician. The severity of the 
criterion may be increased as the training progresses by merely 
setting the level of confidence for rejecting the null hypothesis 
at a lower point—say, from the io per cent L. C. to the 40 
per cent L. C. 

Application: Let us assume that a group of psychometricians 
are to be evaluated. Table 1 demonstrates the actual compu¬ 
tation necessary. Explanation of Table 1 follows; 

Col. 1: Names of evaluated psychometricians. 

Col. 2,: The two test scores obtained by each psychometrician, 
(Si, S2). Let these be Wechsler-Bellevue IQ’s. 


TABLE 1 

Evaluation by Criterion I 


Col. 1 

Col. 2 

Col. 3 

Col. 4 

Col. S 

Col. 6 

Smith 

13a 

126 

It 

92 

108 

6 

S.674 

1.06 

19 % 

Jones 

2 

5.674 

■35 

73 % 

Brown 

i6 

S -674 

2.82 

1% 


Col. 3: The difference between the two test scores obtained by 
each psychometrician. 

Col. 4; The published standard error of measurement for the 
particular test being used; in this example the Wechsler- 
Bellevue test of intelligence (1). This is <r M . 

Col. 5: x as defined in formula (9). 

Col. 6: Approximate level of confidence at which the null hy¬ 
pothesis can be rejected. 

The psychometricians may be compared as follows (see Col. 
6); Jones is best, Smith is next best, and Brown is the poorest 
in psychometric proficiency in regard to the Wechsler-Bellevue 
test of intelligence. 

If the criterion of proficiency were set at the 20 per cent 
L. C., then Jones and Smith passed the criterion and Brown 
failed the criterion. 



336 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Criterion of Psychometric Proficiency II 
Procedure .— 

a) Regard two or more scores obtained by a psychometrician 
from a single testee as a random sample from a normal 
population of such measures. 

b) Regard <t xxp 2 as an estimate of the variance of this popu¬ 
lation. o^p 2 = 2d 2 /n — 1. where 2d 2 is the sum of the 
squared deviations from the mean of the sample in (a) 
and “n” is the number of scores in the sample. 

c. Regard crxx 2 as an estimate of the variance of a normal 
population of a set of measures obtained from the testee 
by a skilled psychometrician. 

d. Criterion of Psychometric proficiency, II is attained when 
Crap 2 is not significantly greater than <r xx 2 . 


TABLE 2 

Evaluation by Criterion II 


Col. 1 

Col. 2 

Col. 3 

Col. 4 

Col. S 

Col. 6 

Joe 

!33 

120 

iaS 

32.19 
d.f. = ® 

43.00 

d.f = 2 

i -34 

> 5 % 

Tom 

90 

86 

94 

98 

9 2 

J12 

96 

32.19 
d.f. = « 

26.67 
d.f. = 3 

1.21 

> 5 % 

Bill 

12.19 
d.f. = 00 

112.00 
d.f. = 2 

3-48 

3 % 


e. Test the null hypothesis by first applying the formula. 


F = 


0”xxp 


& xx 


2 J 


(n) 


where the d.f. 2 of r xx 2 may be taken at =0 and the d.f. of 
of Cxxp 2 is n 1. 


Evaluation: The supervising clinician should first inspect <7 xip 2 
and if <r xxp 2 < <r x p, the psychometrician being evaluated is 
less variable than the skilled psychometrician—at least on the 
basis of this estimate of his variance—and the F test need not 
be made. However, if the supervising clinician wishes to know 
whether or not the psychometrician being evaluated is signifi- 

* The degrees of freedom of trxx 1 is exactly the size of the standardization sample 
minus one. Since the standardization sample is usually several hundred, the error 
introduced by setting the d.f. always at x is very, very small. The utility is that 
only 1 line of the F table need be used. 



EVALUATING PSYCHOMETRIC PROFICIENCY 337 

candy less variable than the skilled standardi2ation psycho¬ 
metrician he may make the F test 

F = (ia) 

^XJCP 

where the degrees of freedom are the same as in (9). 

From Formula (6) it follows that we should expect <r MP 2 > 
o- M 2 . The application of formula (11), applied only when <r ix p 2 
> Cxi 2 , will indicate whether or not the intraining psycho- 
metrician is significantly more variable, and therefore less re¬ 
liable, than the skilled standardization psychometrician. 
Application: Let us assume that a group of psychometricians 
are being evaluated. Table 2 represents the actual computation 
necessary. Explanation of Table 1 follows: 

Col. 1: Names of evaluated psychometricians. 

Col. 2: Wechsler-Bellevue IQ’s obtained by each psychome¬ 
trician. 

Col. 3: The square of the standard error of measurement as 
published for the Wechsler-Bellevue test of intelligence, 
i.e., (5.674)2. This is o-** 2 . 
d.f. = degrees of freedom. 

Col. 4: Estimated variance for a population of such measures 
as sampled in Col. a, This is o- xxp 2 . 
d.f. = degrees of freedom. 

Col. 5: Fisher’s F ratio. 

Col. 6: Level of confidence for rejecting the null hypothesis. 

The psychometricians may be compared as follows: Tom is 
best, Joe is next best and Bill is the poorest psychometrician 
in regard to the Wechsler-Bellevue test of intelligence. From 
2,6.67 < 32.19, we know that Tom is less variable and, there¬ 
fore, probably more reliable than even the standardization psy¬ 
chometrician. However, Tom is not significantly more reliable. 

If the criterion of proficiency had been set at the 5% L. C., 
then Tom and Joe passed the criterion and Bill failed the cri¬ 
terion. 

REFERENCE 

1. Wechsler, D. Measurement of Adult Intelligence. Baltimore: Wil¬ 
liams & Wilkins, 1945. 



INTEREST AND PERSONALITY MEASURES OF 
VETERAN AND NON-VETERAN UNIVERSITY 
FRESHMAN MEN 


KATHERINE K. FASSETT 

University of Wisconsin 

Fifty veterans and fifty-six non-veterans, all freshman men 
coming to the University of Wisconsin Student Counseling 
Center in 1946-48, have been investigated with respect to their 
interest scores on the Strong Vocational Interest Blank and their 
personality scores on the Minnesota Multiphasic Personality 
Inventory. Both the Strong and the Multiphasic are routinely 
administered to all students coming to the Counseling Center; 
Multiphasic scores are K-corrected (3), and the Strong scored 
on thirty-four occupations in eleven groups. The ages of the non¬ 
veterans ranged from 17 to 19 years with the median at 18; of 
the veterans, from 20 to 30, with the median at 22, The length 
of service of the veterans ranged from 24 to 72 months, the 
median being at 33 and a half months. All had some service 
outside of the continental United States, The academic classifi¬ 
cations of Letters and Science, Engineering, and Agriculture 
are represented in both groups. 

Interests, as measured by patterning on the Strong (1), show 
no significant differences between the two groups of men. Judged 
by the total number of A and B+ scores, the veterans have 
more fully crystallized interests than do the non-veterans. This 
difference is significant beyond the one per cent level of con¬ 
fidence, the veterans giving more of the high scores than do the 
non-veterans. Such increase in crystallization of interest with 
added age has been found in previous investigations (4). In the 
case of the groups compared in the present study, there is no 
overlap in age; the difference here found might consequently be 
expected in terms of age alone. However, the studies on which 
such a difference has been demonstrated have not had the 
factor of war experience affecting the older group, and it has 

338 



INTEREST ANO PERSONALITY MEASURES 339 

sometimes been thought that the service experience of veterans 
may have hindered the maturing of their vocational interests 
which would otherwise have come about with added age. The 
comparison of high scores in the present study indicates that 
such maturing has gone on in the veteran group, although to 
what extent, as compared with men of similar age in non-war 
years, cannot be judged from this evidence. 

Multiphasic measures show no significant differences between 
the two groups in central tendencies on any scale; both groups 
have mean profiles which run close to the 30 t-score mean of the 
general population. The mean t-score for no scale, for either 
group of men, was higher than 59, or lower than 49. Greater 
variability for the veterans was shown on several of the scales 


table i 

Standard Deviations of Multiphasic T-Scores* 
Comparison between Veterans and Non-Veterans 



Veterans 

N - 50 

Non-veterans 

N - 5(5 


Scale 

SJD. 

S.D. 

C.R. 

Hs 

9.22 

6.93 

2.01 

Ft 

15.22 

10. 88 

*- 3 J 

Mf 

12.4.1 

8.71 

2.47 

D 

15.12 

10.66 

2.42 


•Scales which are not listed show differences significant at >.o<| level of con¬ 
fidence. 


(Table 1); and the veterans appear somewhat more often than 
do the non-veterans in the score ranges indicative of possible 
personality deviations. Counting the total number of scores on 
all scales for each group, and computing the percentage of such 
total which falls at or above 75 t-score, the veterans show a 
greater percentage of high scores than do the non-veterans. On 
the Mf scale, a larger percentage of the veterans than of the 
non-veterans score at and above a t-score of 70. Both of these 
differences are significant beyond the one per cent confidence 
level. As the Mf scale is usually interpreted, the fact of the 
veterans scoring higher on the scale would indicate the presence 
of more feminine tendencies on their part than on the part of 
the non-veterans. The assumption is often held that young men 
of college age show some aggression to over-protection by their 









34 ° EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

mothers, and take on some feminine attributes in order to 
compete with the mothers, The means of both groups of men in 
this study run somewhat higher than the mean of the general 
population, but the fact that the veterans’ mean is significantly 
above the non-veterans’, can probably not be accounted for by 
mother relationships, since, on this basis alone, the non-veterans 
would be expected to run higher, inasmuch as they have re¬ 
cently been closer to their homes. The presence of more feminine 
tendencies on the part of the veterans might be due to the fact 
that, having been separated from considerable feminine contact 
for some time, they react to such contacts when entering the 
college situation in a coeducational institution—possibly show¬ 
ing an aggression towards, or competition with, the female 
student body which had been predominant on the campus be¬ 
fore the return of the veterans. The fact that these young men 
have chosen to undertake an education rather than to get some 
gainful employment immediately might be the result of one or 
both of two tendencies commonly considered to be feminine: 
an interest in cultural pursuits, and a dependence, in this case 
perhaps a desire to be sheltered by society as represented by the 
Government and the University. Such a desire for dependence 
could very conceivably be the outgrowth of the youths having 
been pushed into the mature role of becoming aggressors for the 
sake of society, at an age and stage of development where many 
of them were not ready for such a role. 

The Si scale (2) indicates that both groups are generally like 
average college students in tendencies toward social participa¬ 
tion, despite the fact that, as freshmen, they are new to the 
University, and are, further, students who have demonstrated a 
felt need for specialized help from the Counseling Center. 
Students come to this Counseling Center on a purely voluntary 
basis. 

These conclusions cannot be applied to student groups as a 
whole without reservation, since this study was limited to 
freshman men; and even these may not be typical of freshman 
men as a whole, since little is known at present as to what “type” 
of student seeks out the services of the Counseling Center. It 
does not seem unlikely, however, that the subjects of the present 
study are more or less representative of today’s student body; 



INTEREST AND PERSONALITY MEASURES 34I 


and inasmuch as psychometric tests are, at the present stage of 
student personnel work, used more frequently on students who 
come for help to some specialized person or agency than on the 
entire college population, it is hoped that the present findings 
may be of some use to those who are attempting to aid students 
in their adjustment during the post-war period. 

REFERENCES 

1. Darley, J. G. Clinical Aspects and Interpretation of the Strong Voca¬ 

tional Interest Blank, New York: The Psychological Corpora¬ 
tion, 1941. 

2, Drake, L. E. "A Social I.E. Scale for the Minnesota Multiphasic 

Personality Inventory. 1 ’ Journal of Applied Psychology) 

XXX (1946), 51-54. 

3, Meehl, P. E. and Hathaway, S. R, “The K Factor as a Supressor 

Variable in the Minnesota Multiphasic Personality Inven¬ 
tory.” Journal of Applied Psychology, XXX (1946), 525- 
564. 

4. Strong, E, K,, Jr, Vocational Interests of Men and Women , Stan¬ 

ford University, California: The Stanford University Press, 

! 943 « 



Award in student personnel research 


C, GILBERT WRENN 
University of Minnesota 

Nominations for the Award in Student Personnel Research 
may now be submitted to the undersigned members of the Com¬ 
mittee on Awards of the Council of Guidance and Personnel 
Associations. At a meeting in Toronto, Canada, July 9th and 
loth, I949, the Board of Representatives of the Council ap¬ 
pointed the Committee on Awards to report at the spring meet¬ 
ing in 1951. The award to be given is not a monetary considera¬ 
tion, but is to be in the form of a statement of recognition by 
the Board of Representatives of the Council of Guidance and 
Personnel Associations. It is planned to make announcement 
of the project or projects selected on Council Day each year 
and to give publicity concerning the selections through pro¬ 
fessional journals. It is hoped that such recognition will not only 
serve to call national attention to significant research already 
completed, but will stimulate further basic research in the field 
of student personnel. 

Although the Council of Guidance and Personnel Associa¬ 
tions is concerned with personnel work and personnel research 
in industry, business, government, and education, the projects 
to be considered for the first award are those which were com¬ 
pleted within the area of personnel work with students in ele¬ 
mentary school, high school, college, and university. 

The committee has decided to limit its consideration of re¬ 
search for the first award or awards to studies which were pub¬ 
lished in some form during the period July 1, 1946, through 
June 30, 1949. It is recognized that there is much valuable re¬ 
search unpublished as yet or that may never be published, but 
the inclusion of all unpublished studies would place an un¬ 
manageable burden upon the committee. Future committee! 


34* 



AWARD IN PERSONNEL RESEARCH 343 

may be more inclusive at this point and at the same time cover 
a more restricted range of time, 

It may be necessary to grant two awards, one for research 
conducted by an individual, another for research conducted by 
an institution or agency. An Honorable Mention List will also 
be prepared. 

Nominations of studies may be made by any member of the 
constituent organizations of CGPA, whether the author of the 
study or not, to any member of the Awards Committee. The 
committee will depend rather heavily upon such nominations 
although it may in addition review the literature and supple¬ 
ment the nominations made from the field. Nominations may 
be made through July 31, 1930. 

The research may have been completed by an individual, a 
group of individuals, or an agency. The individual or individuals 
concerned need not be members of any constituent organiza¬ 
tion 0/ CGPA. The nominations should clearly state the funda¬ 
mental contribution that the research study has made to student 
personnel work at any level , together with a statement of the limita¬ 
tions inherent in the research. The nominator should state as fully 
as possible why he thinks the particular study should be given the 
award. Wherever possible the nominator should send two or more 
copies of the research study for examination by the committee. 

It is essential to define what is meant by both “research” and 
“student personnel work.” The committee has adopted the defi¬ 
nition of research given in Carter V. Good’s Dictionary of 
Education-. “Research is the careful unbiased investigation of a 
problem, based insofar as possible upon demonstrable facts and 
involving refined distinctions, interpretation, and usually some 
generalization.” The research to be considered may fall in either 
of two general classifications: studies involving directly any of 
the personnel services listed below; secondly, educational, psy¬ 
chological, or sociological studies of a more basic nature that 
contribute fundamentally to a change or development in any of 
the listed personnel services. 

The definition of student personnel work is condensed directly 
from a statement of the Study Commission of the Council of 
Guidance and Personnel Associations at the Chicago meeting 



344 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in 1949. The services ordinarily to be interpreted as student 
personnel services at various levels of education are the follow¬ 
ing: 

1. The interpretation of the school to the individual, 

I . The maintenance of personnel records and the development 
of their use. 

3. The provision of competent counseling to assist the in¬ 
dividual in achieving his best educational, vocational, and 
personal adjustment. 

a. This service will have access to psychological testing and 
such other special diagnostic services. 

b. This service will give vocational information and will be 
closely correlated with the placement program. 

c. This service will supplement the counseling efforts of 
classroom teachers. 

4. Physical and mental health services. 

5. Remedial services in such areas as speech, hearing, reading 
and study habits. 

6. Supervision and integration of housing and food services. 

7. A program of activities designed to induct the individual 
into his new life and environment as a member of the school 
community. 

8. The encouragement and supervision of group activities 
significant to the individual. 

9. A program of recreational activities designed to promote 
lifetime interests and skills appropriate to the individual. 

10. The treatment of discipline as a learning experience. 

II. Financial or similar aid. 

12. Opportunities for securing help through part-time and sum¬ 
mer employment. 

13. Assistance to the individual in finding appropriate em¬ 
ployment when leaving school and later in achieving oc¬ 
cupational adjustment and advancement. 

14. Enrichment of the life of the individual by providing learn¬ 
ing and experiences in the area of spiritual and ethical 
values. 

15. Provision of opportunities for making socially desirable 
adjustments in relation to the opposite sex. 

16. The continuing evaluation of student personnel services 
in order to make them more effective in the life of the in¬ 
dividual. 

The members of the Committee on Awards are: 

Dr. Mitchell Dreese 

George Washington University, Washington, D. C. 

Dean Clifford Houston 

University of Colorado, Boulder, Colorado 



AWARD IN PERSONNEL RESEARCH 


345 


Dr. Warren K. Layton 

Detroit Public Schools, Detroit, Mich. 

Dean Hilda Threlkeld 

University of Louisville, Louisville, Kty. 

Dr. C. Gilbert Wrenn, Chairman 
University of Minnesota, Minneapolis, Minn. 



QUICK ESTIMATION OF MULTIPLE R 

WILLIAM LEROY JENKINS 
Lehigh University 

By the short-cut method described below, the multiple R for 
a test battery can be estimated in a few minutes with a degree 
of accuracy sufficient for many practical purposes. Even if a 
Doolittle solution is finally obtained, the method provides a 
preliminary estimate and a useful cross-check against serious 
blunders in computation. 

Although intended only as a rough-and-ready approximation, 
the short-cut has shown so far an astonishing agreement with 
Doolittle multiple R’s. In no case has the difference exceeded 
.02 and in a set of 20 five-variable problems the mean dis¬ 
crepancy was only .005. 

Method with Example 

1. Arrange the matrix in descending order of validities. Con¬ 
vert r’s to E’s using Table 1. 

r-matrix E-matrix 



Val. B 

c 

D 


Val. 

B 

c 

D 

A 

.60 .50 

.40 

•30 

A 

20.0 

13-4 

8. 4 

4.6 

B 

■ 5 ° 

.20 

.20 

B 

13-4 


2.0 

2.0 

C 

.40 


.20 

C 

8.4 



2.0 

D 

■30 



D 

4.6 





2 . Compute the product of the validity E of the first test 
(Primary) and the intercorrelation E between the first two 
tests. Find this product on the ordinate scale of Figure I 1 and 
move across interpolating between the diagonal lines for the 
validity E of the second test (Secondary). From this intersec¬ 
tion move vertically to the scale of Added E. Add the Primary 
to the Added E to obtain the multiple E for the first two tests. 

Primary Inter, Product Secondary Added E Multiple E 

20.Q 13-4 268 13.4 3.7 23.7 (AB) 

l The chart in Figure I is too small for convenient use. The author will be glad to 
furnish without charge a photoprint reproduction of the original 8$" x n* chart on 
cross-section paper. 


346 



QUICK ESTIMATION OF MULTIPLE R 


347 


TABLE i 


Conversion oj r to E 


r 

E 

r 

E 

r 

E 

r 

E 

r 

E 

.10 

0-S 

■30 

4.6 

■So 

13-4 

.70 

28.6 

.90 

56.4 

. 21 

0.6 

■31 

4'9 

■SI 

13.8 

•71 

29.6 

■ 9 r 

58-5 

. 12 

,°-7 

■32 

5'3 

• 52 

14.6 

.72 

30.6 

,92 

60.8 

■ij 

0.9 

■33 

5-6 

■53 

IS- 2 

•73 

3 i -7 

•93 

63.2 

.14 

I .O 

■34 

6.0 

•54 

15.8 

■74 

32.7 

■94 

65-9 

- 1 ? 

I. I 

•35 

6-3 

■55 

16.3 

■75 

33-8 

■95 

68.8 

.16 

1-3 

■36 

6.7 

.56 

17.2 

.76 

35-0 

.96 

72.0 

•17 

>•5 

.37 

7 -i 

■57 

17.8 

.77 

36.2 

■97 

75-7 

.18 

1.6 

.38 

7 -5 

.58 

18.5 

.78 

37.4 

.98 

80.1 

19 

1.8 

•39 

7-9 

■59 

19.3 

•79 

38-7 

■ 99 

85-9 

.20 

2.0 

.40 

8.4 

.60 

20.0 

.80 

40.0 



.21 

2.2 

.41 

8.8 

.6l 

20.8 

.81 

41.4 



.22 

2-5 

.42 

9'3 

.62 

21.5 

,8a 

42.8 



•23 

2.7 

■43 

9-7 

■63 

22.3 

■83 

44.2 



.24 

2.9 

■44 

10.2 

.64 

23.2 

.84 

45-7 



•35 

3-2 

•4 s 

10.7 

.65 

24.0 

.8^ 

47-3 



.26 

3-4 

.46 

11.2 

.66 

24.9 

.86 

49-0 



.27 

3-7 

■47 

ti.7 

.67 

25 ■ 8 

.87 

50.7 



.28 

4.0 

.48 

12.3 

.68 

26.7 

.88 

52.5 



.29 

4-3 

■49 

12.8 

.69 

27.6 

.89 

54-4 




ADDED E 



Chart for Added E 

The dotted lines in the upper left show the method of finding Added E for step i 
of the problem in the text. 

3. Compute the product of the multiple E for the first two 
tests (Primary) and the larger of the intercorrelations of the 
third test with the first and second. Using this product and the 





348 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

validity of the third test as Secondary, find the Added E and 
the new multiple E. 

Primary Inlet. Product Secondary Added E Multiple E 

23.7 8.4 199 8.4 1.7 25.4 (ABC) 

4. Continue in a similar manner, always using the largest of 
the intercorrelations of the new test with those already form¬ 
ing the multiple. 

Primary Inter, Product Secondary Added E Multiple E 

25.4 4.6 117 4.6 0.6 26.0 (ABCD) 

Convert the final multiple E to multiple R by reference 
to Table 1. 

Multiple E 26.0 

Multiple R .67 (Doolittle .673) 

It will be observed that the process is one of building up 
the multiple by treating the successive steps as individual 
three-variable problems, which was the basis of a method 2 pre¬ 
viously published. In the present short-cut, however, the work 
is considerably reduced, apparently without any serious loss of 
accuracy. 

1 Jenkins, W. L, "A Quick Method for Multiple R and Partial r’s." (Educational 
and Psychological Measurement), VI (1946), 173-286. 

ERRATUM 

In the article by William Leroy Jenkins which appeared in the Spring, 1050, issue 
of this journal the figure at the bottom of page 143 should be .79 instead of .89. 



RECENT PUBLICATIONS RECEIVED 

Brouwer, Paul J. Student Personnel Services in General Education . 
Washington: American Council on Education, 1949, 317 
PP- $ 3 - 50 - 

Carter, HomerL. J. and McGinnis, Dorothy J, Reading Manual 
and Workbook. New York: Prentice-EIall, 1949. mo pp. 
$ 1 . 75 - 

Cavan, Ruth S.; Burgess, Ernest W.; Havighurst, Robert J. 
and Goldhamer, Herbert. Personal Adjustment in Old 
Age. Chicago: Science Research Associates, 1949. io 4 pp. 
I2..95- 

Cronbach, Lee J. Essentials of Psychological Testing. New York: 

Harper & Bros., 1949. 475 pp. $4.50. 

Freeman, Frank S. Theory and Practice of Psychological Testing. 

New York: Henry Holt & Co., 1950. 518 pp. $3.50. 
Goodenough, Florence L. Mental Testing, Its History , Principles 
and Applications. New York: Rinehart & Co., 1949. 609 pp. 
$5.00. 

Gray, Robert D. and Staff. Selected Personnel Practices of Large 
Employers in Los Angeles Cointy. Pasadena: Industrial Re¬ 
lations Section, California Institute of Technology. Cir¬ 
cular No. 18. 1 a pp. $1.00. 

Gray, Robert D. and Staff. Survey of Selected Personnel Practices 
in Los Angeles County. Pasadena; Industrial Relations Sec¬ 
tion, California Institute of Technology. Bulletin No. 17, 
94 PP- 

Hurd, A. W. Problems of Collegiate Success or Failure with Particular 
Reference to Professional Schools of Medicine. Richmond: 
Bureau of Educational Research and Service, Medical Col¬ 
lege of Virginia. 124 pp. $2.50. 

Johnson, Palmer 0 , Statistical Methods in Research. New York: 
Prentice-Hall, 1949. 377 pp. 

Lawrence, Merle. Studies in Human Behavior. A Laboratory Man¬ 
ual in General Psychology. Princeton: Princeton University 
Press, 1949. 184 pp. $3.50. 

Mathewson, Robert H. Guidance Policy and Practice. New York; 

Harper & Bros., 1949. 293 pp. $3.00, 

Mosteller, Frederick, Hyman, Herbert, McCarthy, Philip J. 
Marks, Eli S. and -Truman, David B. The Pre-Election 
Polls of 1948. New York: Social Science Research Council, 
1949. 396 pp. $2.50 (paper), $3.00 (cloth). 

O’Kelly, Lawrence I. Introduction to Psychopathology. New York: 
Prentice-Hall, 1949. 736 pp. 

Parten, Mildred B. Surveys, Polls and Samples. New York: Harper 
& Bros., 1950. 624 pp. $5.00, 


349 



350 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Pease, Katharine. Machine Computation of Elementary Statistics. 
New York: Chartwell House, 1949. 208 pp. 

Pray, Kenneth L. M. Social Work in a Revolutionary Age and other 
papers. Philadelphia: University of Pennsylvania Press, 
1949. 308 pp, $4,00. 

Reynolds, Lloyd G, and Shister, Joseph. Job Horizons: A Study 
of Job Satisfaction and Labor Mobility. New York: Harper 
& Bros., 1949. iot pp. $2.2,5. 

Robinson, Virginia P. Dynamics of Supervision under Functional- 
Controls. Philadelphia: University of Pennsylvania Press, 
1949. 154 pp. $2.25, 

Sheldon, William H. Varieties of Delinquent Youth. An Introduction 
to Constitutional Psychology. New York: Harper & Bros., 
1949. 899 pp. $8.00. 

Snygg, Donald and Combs, Arthur W. Individual Behavior: A 
New Frame of Reference for Psychology. New York: Harper 
& Bros., 1949. 386 pp. $3.50. 

Stone, Calvin P. Case Histories in Abnormal Psychology. Stanford: 
Stanford University Press, 1949. 106 pp. $1.75. 

Stuit, Dewey B., Dickson, Gwendolen S., Jordan, Thomas F. 
and Schloerb, Lester. Predicting Success in Professional 
Schools. Washington: American Council on Education, 1949. 
187 pp. $3.00. 

Super, Donald E. Appraising Vocational Fitness By Means of Psy¬ 
chological Tests. New York: Harper & Bros., 1949. 727 pp. 
$6.00. (Text edition, $5.00.) 

Thorndike, Robert L. Personnel Selection: Test and Measurement 
Techniques. New York: John Wiley & Sons, 1949. 358 pp. 
$4.00. 

Travers, Robert M. W. How to Make Achievement Tests. New York: 
The Odyssey Press, 1950. 180 pp, $2,25. 

Wallin, J. E. Wallace. Children with Mental and Physical Handi¬ 
caps. New York: Prentice-Hall, 1949. 549 pp. $5,00, 

Weitzman, Ellis and McNamara, Walker J. Constructing Class¬ 
room Examinations: A Guide for Teachers. Chicago: Science 
Research Associates, 1949. 153 pp. $3.00. 

Williamson, E, G. (Ed.) Trends in Student Personnel Work. Min¬ 
neapolis: University of Minnesota, 1949. 417 pp. $5.00. 

Manpower Branch, Human Resources Division. Office of Naval 
Research. The Development of a Test for Selecting Research 
Personnel. Pittsburgh: American Institute for Research, 
I 95o- 33 PP- 



THE CONTRIBUTORS 


Sidney Adams —Ph.D,, University of California, 1933. Job-de¬ 
scription writer, U. S. Employment Service, Occupational Research 
Program, I 935 ~ I 937 - Research in trade, clinical and other tests, 
Tennessee Valley Authority, 1937-1946. World War II service in 
test research, aviation psychology and clinical psychology, 1941- 
1945, Test developer, U. S. Civil Service Commission, at present. 

Robert B. Ammons —Ph.D,, University of Iowa, 1946. Assistant 
Professor of Psychology, Director of Psychological Service for Chil¬ 
dren, University of Denver, 1946-1948. Assistant Professor of Psy¬ 
chology, Tulane University, 1948- Dept, of Psychology, University 
of Louisville, 1949-. Member, American Psychological Association, 
American Statistical Association, Psychometric Society, Sigma Xi. 

P. C. Baker—M.S., Purdue University, 1948. Assistant to the 
Director, Division of Educational Reference, Purdue University, 
1949-. 

BenjaminBalinsky —Ph.D., New York University, 1940. Psycholo¬ 
gist, Bellevue Hospital, 1935-1939. Senior Psychologist, later Head, 
Psychologist Consultation Service, National Youth Administration, 
I 939 _I 94 2 " Civilian Psychological Consultant to War Department, 
194a. Psychologist and Counselor, Consultation Service of Voca¬ 
tional Advisory Service, 1942,-1947. Evening and summer teaching, 
City College of New York, 1942- Instructor, City College of New 
York, 1947- Part-time Psychological Consultant to Vocational Serv¬ 
ices Dept., United Service for New Americans. Author of articles on 
tests and measurements in vocational and clinical fields. Fellow, Amer¬ 
ican Psychological Association, Division of Counseling and Guidance 
and Division of Clinical Psychology. Diplomate in Clinical Psychol¬ 
ogy, American Board of Examiners in Professional Psychology. Mem¬ 
ber, American Association for the Advancement of Science, American 
Academy of Political and Social Sciences. 

Hubert E. Brogden—Ph.D., University of Illinois, 1939. Instruc¬ 
tor, Ohio State University, 1939-1940. Statistician, U. S. Public 
Health Service, 1940-1942. Personnel Research Section, AGO, 1943—. 
Author of articles in Psvchometrika, Journal of Educational Psy¬ 
chology, Psychological Monographs and Journal of General Psychology. 

Robert Callis —B.Ed., Southern Illinois Normal University, 1942. 
With the U. S. Navy, 1942-1946. Counselor, General College, and 
graduate student, University of Minnesota, 1946, 

Raymond B, Cattell—Ph.D., University of London, 1929; D.Sc. 
{ibid), 1939. Director of City Child Guidance Clinic, Leicester, 
England, 1932.-1937. Research Associate to Professor Thorndike, 


3S 1 



352 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Teachers College, Columbia University, 1937 - *938- G. Stanley Hall 
Professor, Clark University, 1939-1941. Lecturer, Harvard Univer¬ 
sity and Civilian Consultant, Adjutant General’s Office, 1941-1944, 
Research Professor in Psychology, University of Illinois, 1945- Au¬ 
thor of A Guide to Menial Testing, Crooked Personalities in Childhood 
and Ajter, General Psychology , The Description and Measurement of 
Personality , and other books on personality and social psychology, 
as well as of research articles in American and British journals. 
Member, American Psychological Association, Fellow, British Psy¬ 
chological Society. 

Orrin H. Cross—M.S., University of Minnesota, 1945. Teaching 
Assistant in Psychology, University of Minnesota, 1943-1944. Coun¬ 
selor I, U. S. Employment Service, 1945-1946. Lecturer in Psychol¬ 
ogy, University of Pittsburgh, 1948-1949. Assistant Professor of 
Psychology, University of Alabama, 1947- (On leave, I948-I949), 
Associate Member, American Psychological Association. Member, 
Eastern Psychological Association, Southern Society for Philosophy 
and Psychology, Alabama Academy of Science. 

James W. Curtis—M.S., University of Kentucky, 1938. Graduate 
Assistant, University of Kentucky, 1937-1938. Research Psycholo¬ 
gist, United States Forest Service, 1938-1939. Acting Head, Dept, 
of Psychology, Pikeville College, 1939-1941. Personnel Consultant, 
Classification Officer, Personnel Liaison Officer, U. S. Army Air 
Forces, 1941-1947. Supervising Psychologist, Illinois Division of Vo¬ 
cational Rehabilitation, 1947- Supervising Psychologist, Springfield 
(Ill.) Mental Hygiene Clinic, 1949- Psychological Consultant to 
Patton, Evans, Masters Medical Group, 1948-. Author of articles on 
attitudes, hypnosis, remedial reading, etc., in professional journals. 
Member, American Psychological Association, Southern Society for 
Philosophy and Psychology, Illinois Psychological Association, Ken¬ 
tucky Psychological Association, American Association for the Ad¬ 
vancement of Science. 

N. M. Downie—Ph.D., University of Syracuse, 1948. Instructor 
in Biology, Robert College, Istanbul, Turkey, 1936-1939. Instructor 
in Education and Graduate Assistant, Evaluation Service Center, 
Syracuse University, 1946-1948. Assistant Professor of Education, 
State College of Washington, 1948-. 

Frank M. du Mas—M.A., University of Virginia, 1941. Graduate 
Student, University of Virginia, 1941-1942. War work and military 
service, 1942-1945, Instructor in Psychology, University of Denver, 
1945—1947. Research Assistant, University of Iowa, 1947—1948. Asso¬ 
ciate Professor of Psychology, Florida State University, 1948-. On 
contract, Office of Naval Research, under the guidance of the Amer¬ 
ican Council on Education. 

Allen L. Edwards—Ph.D., Northwestern University, 1940. Assist¬ 
ant in Psychology, Ohio State University, 1937-1938. Assistant in 
Psychology, Northwestern University, 1938-1940. Instructor in Psy¬ 
chology, University of Akron, 1940-1941. Psychologist, Special Study 



THE CONTRIBUTORS 


353 


Group, Military Intelligence, War Dept., 1941-1942. Psychologist, 
Overseas Branch, Office of War Information, 1942-1943. Assistant 
Professor of Psychology, University of Maryland, 1943-1944. Con¬ 
sultant, War Relocation Authority, 1943-1944. Associate Professor 
of Psychology, 1944-1948; Professor of Psychology, 1948-, Univer¬ 
sity of Washington, Author of Statistical. Analysis and Psychology: 
A First Course in Human Behavior and author of articles on social 
psychology, statistics, and scale construction. Fellow, American Psy¬ 
chological Association. Member, American Statistical Association, 
Biometric Society, Western Psychological Association, Advisory 
Board, Washington Public Opinion Laboratory. President, State 
Psychological Association of Washington. Consulting Editor, Journal 
of Abnormal and Social Psychology, Journal of Applied Psychology. 

Katherine K. Fassett (Mrs. N. C. Fassett)—M.A., University of 
Wisconsin, 1946. Teaching Assistant, Dept, of Psychology, 1945- 
1946; Personnel Assistant, Student Counseling Center, Sept.-Dee., 
1946; Counselor, Student Counseling Center, 1947-, University of 
Wisconsin. Associate Member, American Psychological Association, 
A.P.A. Division of Counseling and Guidance, Association of Mid¬ 
western College Psychiatrists and Clinical Psychologists. Member, 
Midwestern Psychological Association, American Association for the 
Advancement of Science. 

J. P. Guilford—Ph.D., Cornell University, 1927. Instructor in 
Psychology, University of Illinois, 1926-1927. Assistant Professor 
of Psychology, University of Kansas, 1927-1928. Associate Professor 
of Psychology, 1928-1940; Director, Bureau of Instructional Re¬ 
search, 1938-1940, University of Nebraska. Professor of Psychology, 
University of Southern California, 1940-1942. Director, Psycholog¬ 
ical Research § 3, Santa Ana Air Base; Director, Psychological Re¬ 
search #2, Aviation Cadet Center, San Antonio; Chief, Field Re¬ 
search Unit, Army Air Forces Training Command Headquarters, 
Fort Worth, Texas; Chief, Dept, of Records and Analysis, Army 
Air Forces School of Aviation Medicine, Randolph Field, with rank 
of Colonel, 1942-1946. Professor of Psychology, University of South¬ 
ern California, 1946- Fellow, American Association for the Advance¬ 
ment of Science, American Psychological Association. Member, Psy¬ 
chometric Society, Society of Experimental Psychologists, Western 
Psychological Association, Society of Mathemtical Statistics, South¬ 
ern California Psychological Association. 

Arlene B. Heist —M.A., University of Illinois, 1948. Junior Coun¬ 
selor, S.L.A., University of Minnesota, 1948-1950. 

Paul A. Heist —M.A., University of Illinois, 1948. Graduate Stu¬ 
dent, Univetsity of Minnesota, at present. 

William Leroy Jenkins —Ph.D,, University of Michigan, 1936. In¬ 
structor, Assistant Professor, Lehigh University, 1935-43. Research 
Associate, University of California Division of War Research, 1943— 
44. Supervisor, Training Aids, Columbia University Division of War 
Research, Submarine Training Section, 1944-45. Associate Professor 



354 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

of Psychology, 1946-1947; Professor of Psychology, 7547— Lehigl 
University. Author of articles on cutaneous sensitivity. Member 
American Psychological Association. 

C. H. Lawshe—Pli.D., Purdue University, 1940. Member of Fac 
ulty, Division of Education and Applied Psychology, 1941-; P ro 
fessor of Psychology, 1947-; Research Associate in Statistical Labora 
tory, 1948-, Purdue University. Private Consultant to Management 
1942-. Diplomate in Industrial Psychology, American Board of Ex 
aminers in Professional Psychology. Fellow, American Psychologies 
Association, American Association for the Advancement of Science 
Author, Principles of Personnel Testing. Co-author, with Josepl 
Tiffin and E. j, Asher, of Workbook for Psychology of Normal People 
Author of articles in professional journals. 

William B. Michael —Ph.D., University of Southern California 

1947. Teaching Assistant in Mathematics, 1942.-1943; Instructor it 
Engineering Mathematics, E.S.M.W.T., 1942—1945, California In 
stitute of Technology. Instructor in Mathematics, Pasadena Junio 
College, 1943-1944. Lecturer in Mathematics, 1944-1947; Lecture 
in Education and. Psychology, 1945-1947, University of Southert 
California. Research Associate, College Entrance Examination Board 
i 947 - I948- Assistant Professor of Psychology, Princeton University 

1947- . Associate Member, American Psychological Association. Mem 
ber, Mathematical Association of America, Institute of Mathe 
matical Statistics, American Statistical Association, Psychometri' 
Society, Western Psychological Association, Southern California Psy 
chological Association, Phi Beta Kappa, Sigma Xi, Phi Kappa Phi 
Phi Delta Kappa. 

Adam Poruben, Jr..— Ed. D., Columbia University, 1943. Teache 
of Science and Mathematics, Saunders Trades School, Yonkers, N 
Y,, 1934-1944. Personnel Research Technician, Personnel Researcl 
Section, AGO, 1944-1945. Research Psychologist, Encyclopedia Bri 
tannica Films, 1945-1946. Staff Psychologist, Personnel Division 
Metropolitan Life Insurance Co., 1946-. Associate Member, Amer 
ican Psychological Association. Member, American Statistical Asso 
ciation, American Educational Research Association, Committee 01 
Psychological Testing, Commerce and Industry Association of Ne\> 
York, 

George Spache—Ph.D., New York University, 1938. Teacher 
public schools, N. Y. City, 1930-1936. Psychologist, public ant 
private schools, N. Y. City and Westchester County, 1936-1944 
Psychologist, Horace Greeley School, Chappaqua, N. Y., 1944-1948 
Consulting Psychologist in Industry, Rohrer, Hi bier and Replogle 

1948. Psychologist, Pupil Personnel Services, Westchester County 

1948- Instructor in Education, New York University, 1944-. In 
structor, Rutgers University, 1948- Author of articles on reading 
spelling, etc., The Binocular Reading Test and An Incomplete Sentenc 
Test for Use in Industry. Member, American Psychological Associa 
tion, Committee on Diagnostic Reading Tests. 

Martin Spiaggia —M.A., New York University, 1949. Served wit] 
the U. S. Armed Forces, Military Intelligence, 1942.-1945. Intern 



THE CONTRIBUTORS 


355 


Clinical Psychologist, New York State institutions, 1947-1948, Coun¬ 
selor and Research Assistant, City College Vocational Advisement 
Unit, 1948-. Member, American Psychological Association, Eastern 
Psychological Association. 

Roger G. Stewart—M.A., University of Illinois, 1948, Assistant 
in Psychology, University of Illinois, 1949-. 

Erwin K. Taylor—Ph.D., Northwestern University, 1941. Person¬ 
nel Examiner, Illinois State Civil Service Commission, 1042-1943. 
Personnel Research Section, AGO, 1945 —■ Fellow, American Psy¬ 
chological Association. Member, Psychometric Society, Civil Service 
Assembly of U. S. and Canada, 

Maurice E. Troyer—Ph.D., Ohio State University, 1935. Teacher 
of biology, athletic coach and Superintendent, Bureau Township 
Schools, 1923-1929. Assistant Professor of Psychology and Dean of 
Men, Bluffton College, 1930-1932, Instructor in charge of remedial 
program, Ohio State University, 1932-1936. Assistant Professor and 
Associate Professor of Education, Syracuse University, 1936-1940. 
Associate in Evaluation, Cooperative Study in Teacher Education, 
American Council on Education, 1940-1943. Professor of Education 
and Director of Bureau of School Services, Syracuse University, 1943- 
1945. Director of Evaluation Service Center, 1945-1947, and Director 
of Psychological Service Center, 1947-1949. Vice-President in charge 
of curriculum and instruction, Japan International Christian Uni¬ 
versity, 1949—■. Author, with Pressey, of Laboratory Workbook in 
Educational Psychology, with Pace, of Evaluation in Teacher Educa¬ 
tion ; with Syracuse School of Education Faculty, of A Functional 
Program in Teacher Education ; of articles in yearbooks and profes¬ 
sional journals. Member, American Psychological Association, Ameri¬ 
can College Personnel Association, American Educational Research 
Association, American Association for the Advancement of Science, 
Phi Delta Kappa, Sigma Xi. 

G. Gilbert Wrenn—Ph.D., Stanford University, 1932. Vocational 
Counselor, Stanford University, 1928-1936. Associate Director, Gen¬ 
eral College; Associate Professor, Educational Psychology, 1936— 
1938; Professor of Educational Psychology, 1938—, University of 
Minnesota. On military leave, serving in the Bureau of Naval Per¬ 
sonnel and Pacific Area as Personnel Officer, 1942-1946. Associate, 
American Youth Commission, 1939-1941. Consultant, Student Per¬ 
sonnel Teacher Education Commission, American Council on Edu¬ 
cation, 1939-1942. President, National Vocational Guidance Asso¬ 
ciation. Vice-President, Council of Guidance Personnel Association, 
1946-. Author and coauthor, Student Personnel Problems, Studying 
Effectively , Aids to Group Guidance, Time on Their Hands, and of 
professional articles. 

Wayne S. Zimmerman —Ph.D., University of Southern California, 
1949. Personnel Consultant Assistant, U.S.A.A.F., 1942-1945. Vet¬ 
erans Counselor, 1945-1946; Research Psychologist, 1946-1948, Uni¬ 
versity of Southern California. Psychologist, Sears Roebuck and Com¬ 
pany, 1949-. Author of articles in professional journals* 




EDUCATIONAL and 
PSYCHOLOGICAL 



MEASUREMENT 



VOLUME TEN, NUMBER THREE, AUTUMN, 1950 


A Study of General Education at Syracuse University with 
Special Attention to the Objectives. N. M. Downie, C. R. 

Pace and M. E. Troyer. 359 

Educational Growth as Shown by Retests on the Graduate Record 

Examination. Joseph C. Heston. 367 

The Assessment of the Academic Aptitude of the Graduate 
Student. Robert M. W. Travers and Wimburn L. 

Wallace. 371 

Measuring Originality in the Physical Sciences. Milton M. 

Mandell. 380 

Probability Approach to Forecasting University Success with 

Measured Grades as the Criterion. L. J. Lins. 386 

Preferences and Behavior Ratings of Dominance. William R. 

Birge. 392. 

Reproducible Scales and the Assumption of Normality, Robert 

G. Smith, Jr. 393 

A Factorial Study of Beliefs. J. W, Holley and Claude E. 

Buxton. 400 

Opinion and Action: A Study in Validity of Attitude Measure¬ 
ment. C. Robert Pace. 411 

Estimating Intelligence by Interview. Joseph V. Hanna. 420 

Inclusion of “None of These" Makes Spelling Items More Diffi¬ 
cult. Marcia Boynton. 431 

A Table and an ABAC for Testing the Significance of Rho. 

Frank M. DuMas. 433 

Recent Publications Received . 437 

The Contributors . 43 ^ 


















A STUDY OF GENERAL EDUCATION AT SYRACUSE 
UNIVERSITY WITH SPECIAL ATTENTION 
TO THE OBJECTIVES 

N. M. DOWNIE 
State College of Washington 

C. R. PACE and M. E. TROYER 
Syracuse University 

During the academic year, 1947-1948, Syracuse University 
carried out an all-university self-survey. Among the concerns 
of this survey was a study of the status of the program of gen¬ 
eral education of the University. 

The questions that this study proposed to answer were: 

i. What objectives of general education do the students be¬ 
lieve to be important? 

1. How much of these objectives do the students think that 
they are achieving in their education at Syracuse? 

3. How do the members of the faculty rate the importance 
of these same objectives and what responsibility does each 
staff member assume toward helping students achieve 
these objectives? 

4. What is the achievement in general education of Syracuse 
students as measured by a standardized test of general 
education? 

5. How well-informed are these same students on current 
events? 

6. What are the opinions of the students on some contro¬ 
versial or widely-discussed issues of the day? 

This paper will be concerned with the first three of the above 
questions. The findings of the last three will be reported in 
later issues of this journal. 

Members of the Senior Class, 1948, and of the Sophomore 
Class, 1950, from the following five colleges of the University 
participated in the study: Applied Science, Business Adminis¬ 
tration, Fine Arts, Home Economics and Liberal Arts. An 


359 



360 educational and psychological measurement 

entire day late in the fall term of 1947 was given over to the 
testing program involving students. Each student took Form 
X, the 1947 edition, of the Cooperative General Culture Test, 
the Time Magazine Current Affairs Test, reacted to an opinion 
scale on current issues and to a list of objectives of general 
education. These same objectives of general education were 
included in the General Questionnaire completed by each staff 
member as a part of the survey. 

The Objectives of General Education. —A list of objectives of 
general education developed by a committee of the American 
Council on Education and reported in A Design for General 
Education for Members of the Armed Forces 1 was modified and 
used as the basis for this part of the study. This list consists of 
eighteen items as shown in Table 1. 

The student was asked to consider each item in two ways. 
First, “How important do you consider this knowledge, skill, 
or understanding as a goal of your education?” Each item was 
to be marked “very important,” “important,” “of some im¬ 
portance,” “of hardly any importance,” or “of no importance.” 
Second, the student was asked to answer the question, “How 
much are you getting of this knowledge, skill or understanding 
from college so far?” In this case the answer categories were 
“much,” “some,” or “little or nothing.” 

Each staff member was asked to consider each objective in 
two ways. The first question was, “How important do you 
consider this objective as a goal of general education for all 
students?” This corresponded to the first ratings of the stu¬ 
dents. The second question was, “ What responsibility does your 
area of instruction assume for helping students make progress 
toward the attainment of this objective?” Each item was to 
be marked “direct responsibility,” “incidental responsibility,” 
or “outside my area of responsibility.” These objectives were 
rated by 689 faculty members in the various colleges. 

The responses of both faculty members and students in 
rating the importance of these objectives were converted into 
percentages and the results are shown in Table 1. 

The responses of the seniors and the sophomores were first 

'■Report of Committees and Conferences, Series I, Vol. VIII, No. 18. Washington 
D. C,: The American Council on Education, June, 1944. 



STUDY OF GENERAL EDUCATION 


36l 


compared. For the majority of the items there were no signifi¬ 
cant differences between the responses of the two classes when 

TABLE I 


Ratings of the Importance of the Objectives of General Education by Faculty 
and Students ( Percentages ) 


Item 


Group 

1 

2 

3 

4 

s 

I 

Developing good health habits 

F 

48 

3 ° 

16 

3 

I 



S 

id 

29 

11 

3 

2 

1 

Understanding the basis of personal and 

F 

38 

39 

19 

I 

I 


community health 

S 

41 

37 

17 

3 

2 

3 

Writing clearly and effectively 

F 

64 

28 

6 

* 

* 



S 

57 

34 

8 

I 

+ 

4 

Speaking easily and well 

F 

59 

33 

7 

* 

* 



S 

75 

22 

3 

* 

O 

5 

Developing social competence and social 

F 

48 

37 

II 

I 

* 


graces 

S 

03 

3 ° 

6 

I 

O 

6 

Understanding other people 

F 

69 

26 

4 

* 

* 



S 

«5 

14 

1 

0 

* 

7 

Preparing for a satisfactory family and 

F 

46 

34 

14 

3 

2 


marital adjustment 

S 

73 

22 

4 

* 

I 

8 

Discovering personal strengths and weak- 

F 

69 

24 

5 

* 

I 


nesses, abilities and limitations 

S 

7 i 

26 

3 

* 

* 

9 

Understanding world issues and pressing 

F 

57 

34 

7 

I 

O 


social, political and economic problems 

S 

49 

39 

12 

* 


10 

How to participate effectively as a citizen 

F 

6 l 

3 * 

6 

* 

* 



S 

49 

4 2 

7 

I 


II 

Understanding scientific developments and 

F 

37 

44 

17 

* 

* 


processes and their application in society 

S 

3 i 

40 

2 5 

2 

I 

12 

How to think clearly, meet a problem and 








follow it to a right conclusion without 

F 

87 

II 

I 

* 

* 


guidance 

S 

8 f 


* 

O 

O 

13 

Developing an understanding and enjoy- 

F 

28 

45 

=3 

2 

I 


ment of literature 

S 

27 

43 

26 

3 

1 

14 

Developing an understanding and enjoy- 

F 

2.1 

3 » 

34 

2 

1 


men t of art and music 

S 

28 

34 

33 

4 

2 

1 5 

Understanding the meaning and values in 

F 

60 

27 

9 

I 

I 


life 

S 

65 

27 

6 

I 

I 

l 6 

Developing a personal philosophy and ap- 

F 

55 

3 ° 

12 

I 

I 


plying it in daily life 

S 

57 

3 ° 

8 

2 

2 

17 

Making a wise vocational choice 

F 

68 

27 

4 

I 

I 



S 

88 

IO 

I 

* 

* 

18 

Preparing for a vocation 

F 

59 

.10 

8 

I 

I 



S 

81 

l 6 

3 

O 



F—Faculty 
S—S tuden ts 
*—Less than 1 % 

I—Very important 
a—Important 

3— Of some importance 

4— Of hardly any importance 

5— Of no importance 


Chi square was used as a test of significant differences. How¬ 
ever, in rating the importance of item 5, “Developing social 
competence and social graces,” the responses of the two classes 


362 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

were significantly different at the 5 per cent level, with more 
seniors rating the item "very important.” Item 10, "How to 
participate effectively as a citizen,” was also found to be sig¬ 
nificantly different at the 5 per cent level, again with more 
seniors rating the objective as "very important.” Items 13 and 
14, "Developing an understanding and enjoyment of litera¬ 
ture” and "Developing an understanding and enjoyment of art 
and music,” were also found to be significantly different at the 
5 per cent level. For both of these objectives, more seniors 
than sophomores rated the items as being "very important.” 

Considering students’ estimates of the amount of each ob¬ 
jective they believe they are achieving, item 4, “Speaking 
easily and well,” was significantly different at the 1 per cent 
level, with the seniors indicating that they were receiving 
more of this objective than the sophomores. Item 7, "Preparing 
for a satisfactory family and marital adjustment” was similar 
to item 4 in all respects. Item 9, "Understanding world issues 
and pressing social, political and economic problems,” was sig¬ 
nificantly different at the 5 per cent level with the sophomores 
recording that they were receiving more of this item than the 
seniors. Item 12, "How to think clearly,” and item 14, "De¬ 
veloping an understanding and enjoyment of art and music,” 
were also significantly different at the five per cent level, with 
the seniors receiving more in both cases. 

Thus, in the few instances where there were differences be¬ 
tween seniors and sophomores, the seniors tended to regard 
the objective as more important and to feel that they had made 
more progress toward its attainment than the sophomores. 

The ratings of the faculty and of the students were tested 
for significant differences, again using Chi square. Twelve of 
the items were rated differently by the two groups. These are 
enumerated below. Item 1, "Developing good health habits,” 
was considered to be more important by the students, 5 per 
cent level. "Writing clearly and effectively,” item 3, was rated 
more important by the faculty, 5 per cent level. Items 4, 5, 6 
and 7, "Speaking easily and well,” "Developing social com¬ 
petence and social graces,” "Understanding other people” and 
"Preparing for a satisfactory family and marital adjustment,” 
were considered more important by the students, all at the 1 
per cent Level. Items 9, 10 and n, “Understanding world 



STUDY OF GENERAL EDUCATION 


363 


issues and pressing social, political and economic problems,” 
“How to participate effectively as a citizen” and “Understand¬ 
ing scientific developments and processes and their application 
in society” were all rated as being more important by the staff, 
all at the 1 per cent level. Items 16, 17 and 18, “Developing a 
personal philosophy and applying it to daily life,” “Making a 
wise vocational choice” and “Preparing for a vocation” were 
all considered more important by the students, all at the 1 per 
cent level. 

A comparison of how much of these objectives the students 
believed they were achieving with the responsibility assumed 
by the faculty members for the achievement of the objectives 
is shown in Table 1. A study of this table shows that for 
most of the objectives, the “Much” column for the students 
and the “Direct Responsibility” column of the faculty contain 
the smallest percentages. The exceptions to this over-all pat¬ 
tern are found in item 6, “Understanding other people,” where 
56 per cent of the students marked that they were receiving 
“Much” and 33 per cent of the faculty considered this ob¬ 
jective to be their “Direct responsibility;” in item ia, “How 
to think clearly,” where 69 per cent of the staff considered this 
objective to be their direct responsibility and only 37 per cent 
of the students believed they were receiving much toward the 
attainment of this objective; and in item 18, “Preparing for a 
vocation,” where 55 per cent of the faculty considered this ob¬ 
jective to be their direct responsibility and 48 per cent of the 
students stated they received much of it. 

If we can assume that, in general, there should be some de¬ 
gree of correspondence between the number of faculty mem¬ 
bers assuming responsibility for an objective and the number 
of students who feel they are making progress toward its 
attainment, then we can compare these two ratings. When 
Chi square was computed for each item, it was found that for 
all but two items there existed significant differences at the 1 
per cent level of confidence. The two for which no differences 
were found were item 4, “Speaking easily and well,” and item 
8, “ Discovering personal strengths and weaknesses, abilities 
and limitations.” A further analysis of these significant differ¬ 
ences showed that, except in items 11, 12, and 17, the students 
were attaining more of these objectives than the faculty was 



364 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


assuming responsibility for. This situation might then be an 
indication of the results of participation in extra-curricular 


TABLE 2 

Ratings of the Amount of the Objectives of General Education Received by the Students 
and the Responsibility Assumed by the Faculty for the Achievement of Each 
Objective (Percentage) 


Item Group 1 2 3 


1 Developing good health habits F 

S 

2 Understanding the basis of personal and com- F 

munity health S 

3 Writing clearly and effectively F 

S 

4 Speaking easily and well F 

s 

5 Developing social competence and social F 

graces S 

6 Understanding other people F 

S 

7 Preparing for a satisfactory family and marital F 

adjustment S 

8 Discovering personal strengths and weak- F 

nesses, abilities and limitations S 

9 Understanding. world issues and pressing F 

social, political and economic problems S 

10 How to participate effectively as a citizen F 

S 

n Understanding scientific developments and F 

processes and their application in society S 

12 How to think clearly, meet a problem and F 

follow it to a right conclusion without S 

guidance 

13 Developing an understanding enjoyment of F 

literature S 

14 Developing an understanding and enjoyment F 

of art and music S 

15 Understanding the meaning and values in life F 

S 

16 Developing a personal philosophy and apply- F 

ing it in daily life S 

17 Making a wise vocational choice F 

S 

18 Preparing for a vocation F 

S 


F—Faculty 
S—Students 

1 For faculty, read "Direct Responsibility” 

For students, read “Much" 

2 For faculty, read "Incidental Responsibility’ 1 - 
For students, read “Some” 

3 For faculty, read "Outside My Area of Responsibility” 
For students, read "Little or Nothing” 


9 

27 

63 

11 

45 

44 

13 

28 

59 

11 

4 S 

44 

27 

54 

19 

22 

5 i 

27 

21 

5 1 

28 

*5 

48 

27 

17 

42 

41 

35 

47 

18 

33 

39 

28 

56 

36 

8 

7 

21 

72 

19 

36 

45 

38 

45 

17 

4 i 

46 

13 

21 

35 

44 

21 

5 i 

28 

16 

41 

43 

22 

55 

23 

37 

31 

32 

23 

41 

36 

69 

22 

9 

37 

5 ° 

13 

IS 

26 

59 

21 

49 

30 

11 

22 

67 

21 

30 

49 

22 

44 

34 

*5 

5 i 

24 

18 

46 

36 

H 

5 ° 

26 

32 

43 

25 

a 7 

44 

29 

55 

29 

16 

48 

42 

10 


activities and residence in fraternities and university houses as 
leading to the realization of some of these objectives of general 
education. 



STUDY OF GENERAL EDUCATION 


3^5 


It should also be borne in mind that, when a student was de¬ 
ciding whether or not he was receiving various amounts of 
these objectives, he was thinking of no specific courses, but 
was considering his entire program covering the four or two 
years that he had been at Syracuse. Each faculty member was 
considering only his limited area of responsibility. Hence, the 
differences are actually much greater than indicated by the 
data because of this difference in the scope of the educational 
program being rated by the two groups. 

The results were next considered in respect to the five dif¬ 
ferent colleges of the University. The ratings of the faculty 
and students of the various colleges were tested for significant 
differences using Chi square. 

Comparing the results obtained on this check-list of general 
education objectives within each of the five colleges studied 
throws some light on the location of responsibility for their 
attainment and the relative estimates of students’ progress 
toward their accomplishment. For example, the two objectives 
concerning personal and community health and the one con¬ 
cerning family and marital adjustment were acknowledged as 
direct responsibilities by a much higher per cent of the Home 
Economics faculty members than by faculty members in other 
colleges. Correspondingly, larger numbers of home economics 
students than students in other colleges felt they were making 
progress toward these objectives. In contrast, there were no 
appreciable inter-college differences on the objectives concerned 
with effective speech and writing. Preparation for effective 
citizenship and understanding current issues were acknowl¬ 
edged most frequently by faculty and students in Liberal Arts 
and Business Administration. Except in the College of Fine 
Arts, almost no faculty members were taking any direct re¬ 
sponsibility for helping students understand and enjoy art and 
music; and almost no students, except in Fine Arts, felt they 
were achieving much of this objective. These comparisons be¬ 
tween colleges are cited as illustrative. Insofar as they are 
objectives of general education for all students there should 
probably be a reconsideration of responsibility for their pro¬ 
motion and students in all colleges should feel that they are 
progressing toward them. 



366 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Findings 

As a result of having eighteen objectives of general education 
rated by students and faculty members, the following con¬ 
clusions can be drawn: 

1. The students of Syracuse University consider the attain¬ 
ment of these objectives of general education as important 
goals of their education. 

2. In the curriculum, as it is now organized, the majority 
of the students feel that they are making "some” but 
not "much” progress toward the achievement of these 
goals. 

3. The Faculty of Syracuse University considers these same 
objectives to be important goals of education. However, 
there is a difference between the importance placed on 
these various objectives by the faculty and students. The 
ratings of twelve of these eighteen objectives by the Fac¬ 
ulty were significantly different from the students’ ratings 
and eight were rated as being more important by the 
students. 

4. For the achievement of most of these objectives on the 
part of the students, the majority of the faculty assume 
no direct responsibility. 

5. To the seniors and the sophomores most of these ob¬ 
jectives were of equal importance. Four of them, 5, 10, 
13 and 14, were rated as being significantly more im¬ 
portant by the seniors. 

6. For the majority of the objectives, the seniors and sopho¬ 
mores felt that they were receiving about the same 
amounts, except for items 4, 7, 12 and 14, which the 
seniors reported to be receiving more of and item 9 which 
the sophomores received more of. 

7. In the five colleges the ratings of the importance of the 
objectives, the amounts that the students were receiving 
and the responsibility the faculty members assumed for 
the achievement of each varied considerably from college 
to college. The importance placed on an objective and 
the amount the students received more or less depended 
on the curriculum of the individual college. 



EDUCATIONAL GROWTH AS SHOWN BY RETESTS ON 
THE GRADUATE RECORD EXAMINATION 


JOSEPH C. HESTON 
DePauw University 

'The Problem 

Educators would like to achieve some objective measure of 
educational growth to demonstrate the progress of students 
through a university curriculum. One such method of evalua¬ 
ting this growth is offered through the use of The Tests of Gen¬ 
eral Education of the Graduate Record Examination. DePauw 
University is now in a position to make an analysis of the test- 
retest records of students who took the Examination in 1946 
(as sophomores) and repeated the same Examination in 1948 
(as seniors). DePauw is one of the universities where sufficient 
students have been tested and then retested to make such an 
analysis possible. Even here, however, the present analysis 
must be restricted to women students, inasmuch as there were 
not sufficient men sophomores tested in 1946 to make an analy¬ 
sis of men’s records worthwhile. Therefore, the present analysis 
deals with the sophomore versus senior records of 157 DePauw 
women students. 

Results 

The most obvious question in this connection would be, how 
much gain do these students show on the eight Tests of General 
Education prepared by the Graduate Record Office? In Table 1 
will be found the mean score of these students as sophomores on 
each of the eight tests and again as seniors on each test. It is 
obvious from the column headed “Mean Gain” that in most 
cases there was an appreciable gain. Only one area, Physical 
Science, showed fundamentally no gain at all. This specific re¬ 
sult is not entirely unexpected, since very few of these women 
students took additional physical science courses during their 
final two years. The greatest gain was exhibited in the area of 

367 



368 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Social Studies, followed rather closely by Effectiveness of Ex¬ 
pression and the General Education Index, derived from the 
battery as a whole. 

Gains thus exhibited may be taken at their face value, but 
the question still remains as to their significance. Statistically 
one approaches this problem by inquiring as to what degree of 
certainty we know the gain may not have been due to mere 
chance factors. The solution to this problem is through the use 
of the critical ratio technique. The critical ratio, found by divid¬ 
ing the difference by the standard error of the difference, may 
be interpreted as follows: A critical ratio of zero means there are 
50 chances in 100 that the gain was due merely to chance. A 
critical ratio of 1.00 means there are 84 chances in 100 that the 

TABLE 1 


Critical Ratios of G.R.E. Test Gains for 157 DcPauw Women Tested as Sophomores 
{1946) and Retested as Seniors (.1948) 


Test 

Means 

Soph. Senior 

Mean 

gain 

Std. Dev. 

Soph. Senior 

Critical Ratio 
of Gain 

Math. 

. 460 

494 

34 

86.3 

97-9 

3-26 

Phys. Sci. 

. 458 

46I 

3 

86.7 

91.4 

0.30 

Biol. Sci. 

. 497 

426 

29 

87.7 

94.6 

2.82 

Soc. Stud. 

. 461 

506 

45 

78.7 

88.3 

4.77 

Literature . 

. 491 

5*3 

3* 

81.4 

84-3 

3-42 

Arts. 

. S°i 

S 28 

27 

72,1 

7 6-3 

3.22 

EfF. Exp. 

. 49 8 

538 

40 

91.6 

86.0 

3-99 

Vocab. 

. 453 

476 

23 

75-9 

83-9 

2 -55 

G.E. Index. 

. 469 

509 

40 

81.1 

89.4 

4.15 


true difference is greater than zero; 2.00 means there are 98 
chances in 100; while 3.00 can be taken a practical certainty 
(100 chances in 100). 

We find only one critical ratio indicating a difference that is 
of no consequence, the one for Physical Science, where the mean 
gain could have been very much a matter of chance. Of the re¬ 
maining eight critical ratios only two, those for Biological Sci¬ 
ence and for Vocabulary, are below the 3.00 level, but these two 
are sufficiently well above the 2.00 level as to mean about 99.5 
chances per 100 of being true differences. The critical ratio is 
not necessarily a measure of the size of the difference, but does 
indicate if a difference is statistically significant and is not due 
to chance factors. We may conclude, therefore, that the gains 
shown on all the tests except Physical Science were sufficiently 
















EDUCATIONAL GROWTH 


3^ 


appreciable to be well beyond the limits of mere chance and, 
therefore, represent statistically significant progress from the 
sophomore year to the senior year. Whether or not this progress 
is as great as a faculty might wish is obviously still a matter of 
question. 

A second question one would raise in connection with this 
test-retest program is to what extent did the various tests agree 
with each other when repeated after two years? This analysis 
is not to be confused with the concept of statistical reliability. 


TABLE a 

Retest Correlations Between Sophomore {1946) and Senior {1948) Administration oj 
G.R.E. Tests to 157 DePauw Women 


Test 


Correlation 
Soph. vs. Senior 


Mathematics. 

Physical Science. 

Biological Science. 

Social Studies. 

Literature. 

Arts - .. 

Effectiveness of Expression 

Vocabulary. 

General Education Index.. 


.689 

‘ 7 1 ? 

.616 

■ 739 
• 639 

■ 7S2 
.70s 
■BJI 
.897 


TABLE 3 

Correlation Between General Education Index ( GRE ) and Scholastic Grade Averages 
( PHR) of 157 DePauw Women 


Variables Correlated Correlation 


Soph. GRE vs. Soph. PHR .603 

Soph. GRE vs. Senior PHR .637 

Senior GRE vs. Soph, PHR . 549 

Senior GRE vs. Senior PHR .604 


Statistical reliability in the process of test construction is de¬ 
termined by test-retest correlation where the examinations are 
administered relatively close together, so that there is little 
chance of actual change occurring. However, in this instance 
the two-year lapse between the tests permitted considerable op¬ 
portunity for educational gain, not necessarily uniform from 
student to student on each of the tests. This was due to the 
situation whereby various students took different curricula and, 
therefore, made more gains in some of the sub-tests than in 
some of the others. 















370 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

In Table 1 we have presented the retest correlations between 
the sophomore and senior administration of the tests to these 
same 157 DePauw women. Seven of the sub-tests exhibit retest 
correlations of .75 or less over the two-year interval. This is not 
surprising because of the varying degrees of educational growth 
in each area for each student. Vocabulary did achieve a retest 
correlation of .85, which indicates considerable consistency from 
one administration to the next. Vocabulary is not a subject- 
matter area, but rather an index of general intelligence, and 
would be expected to show higher retest correlation than the 
specific subject-matter areas. The General Education Index, 
exhibiting a correlation of .897, shows that the battery as a 
whole is remarkably consistent, even when administered with 
two-year interval of time between test and retest. This high re¬ 
test correlation for the battery as a whole may be interpreted 
as meaning students earning a score in the top brackets in the 
sophomore year would almost certainly earn scores in the top 
bracket as seniors. In other words, the matter of gain is a rela¬ 
tive factor and the degree of gain is marked by considerable con¬ 
sistency throughout the battery as a whole. 

A third problem in which one would obviously be interested 
is the relationship between the GRE General Educational Index 
and scholastic grade averages at DePauw for these students. 
In Table 3 we have presented the correlation coefficients be¬ 
tween GRE Indexes and grade averages (PHR) for the four 
possible combinations. Three of these figures are ,60 or higher, 
indicating a strong degree of relationship between GRE scores 
and university grades. It is interesting to note that the highest 
correlation is exhibited between GRE index for sophomores and 
their final senior-grade averages. In this sample at least, it 
seems it would have been sufficient to give the GRE to sopho¬ 
mores and then to predict final senior-grade averages without re¬ 
course to administering the tests again to the seniors. For grade 
prediction this process would have been sufficient, but would not 
have revealed th z growth as exhibited in Table 1. 



THE ASSESSMENT OF THE ACADEMIC APTITUDE 
OF THE GRADUATE STUDENT 


ROBERT M. W. TRAVERS 
and 

WIMBURN L. WALLACE 
University of Michigan 

Introduction 

This is a study of the assessment of the potentialities of the 
graduate student and of the criterion of success in graduate 
school. The study was undertaken at the request of the Execu¬ 
tive Board of the Horace H. Rackham School of Graduate 
Studies of the University of Michigan, and all data were col¬ 
lected within that institution. The study is one of a series of in¬ 
vestigations conducted for the administration of the University 
of Michigan. 

Grades as a Criterion of Success in Graduate School 

Studies of the prediction of academic achievement are legion, 
but few of them are particularly concerned with the character¬ 
istics of the criterion of academic success. Since it is commonly 
believed that the failure of tests to make accurate academic pre¬ 
dictions is a result of the instability of students’ average grades 
from one semester to the next, it seemed wise to collect some 
evidence on that point at the beginning of the present study. 
This was done by finding the correlation between the grades of 
students in two successive semesters in their field of specializa¬ 
tion. From these correlations it is possible to estimate the num¬ 
ber of semesters of graduate work that would have to be taken 
in order for the grade-point average to be a stable criterion of 
graduate success 1 . Table i summarizes these data. 

The estimated correlation between grades for successive years 
is based on the Spearman-Brown formula, It is fairly obvious 
from these data that the average grade for two semesters’ work 

l A stable grade-point average is arbitrarily defined as one that would correlate 0.9 
with another grade-point average computed from an equal period of graduate studies. 


37i 



372 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


is a more stable criterion in some areas than in others. It is 
theoretically quite impossible to predict with any accuracy 
from test scores or other data the grades which a graduate stu¬ 
dent of engineering will obtain during a year of graduate studies 
since grades in that field are highly unstable for a given indi¬ 
vidual. The stability of average grades in other areas follows 
that found in previous studies, with the highest in the physical 
sciences and the lowest in education. 

Background of the Present Study 

The accurate assessment of the student’s potentialities for 
profiting from work at the graduate level is important for two 
reasons. First, there is a need for improving selection procedures. 


TABLE i 
Stability of Grades 



No. of 
students 

Correlation 
of Fall and 
Spring grades 

Estimated corre¬ 
lation of grades 
> for 1 year 
with grades for 
a second year 

Estimated no. of 
Semester's work 
providing a 
stable criterion 

Social Studies. 

... 68 

.66 

•79 

5 

Physical Sciences.... 

... 86 

.68 

.81 

4 

Engineering. 

■ ■ ■ 77 

.28 

■43 

23 

Languages and Lit.. . 

. .. 88 

.65 

■79 

5 

Education. 

.. . 68 

•S 3 

.69 

8 


Second, once the student has been admitted it is important to 
be able to determine how far he should continue graduate work 
so that he can plan an appropriate program. Every graduate 
school is familiar with the student who carries a doctoral pro¬ 
gram to an advanced stage before it is realized that an alterna¬ 
tive program would have been a wiser choice. In order to pre¬ 
vent the occurrence of such cases it is necessary to establish a 
system for appraising the potentialities of the student at an 
early stage in his career. 

There have been two main approaches to the assessment of 
the graduate student by means of tests. One is that of appraising 
his “ background.” In this approach it is assumed that it is most 
important for the student to enter graduate school with a cer¬ 
tain body of information from a variety of subject-matter fields. 
It makes the additional assumption that a liberal education at 










APTITUDE OF GRADUATE STUDENT 


373 


the college level supplies the student 'with a relatively fixed 
body of information which can be measured by tests such as 
the Graduate Record Examination. The philosophy of education 
implied by this approach is more in keeping with the goals of 
higher education of the last century than with those of the 
present decade. 

The other approach to the assessment of the graduate student 
is that of determining the extent to which he exhibits the psy¬ 
chological processes and intellectual skills which are important 
for graduate work. This approach was given limited recognition 
in the Graduate Record Examination in the verbal factor test 
and is implicit in current proposals for the revision of that ex¬ 
amination. It is also illustrated by the use by graduate schools 
of high-level tests of general ability such as the C.A.V.D. scale, 
and to a lesser extent by the Miller Analogies Test. 

There are several major reasons at the present time for avoid¬ 
ing the appraisal of the prospective graduate student in terms 
of his knowledge. First, it is impossible to identify a common 
body of knowledge which all graduate students should possess, 
and this will become a progressively more difficult task with the 
growing emphasis on intellectual skills and arts as major out¬ 
comes of a liberal education at the college level. This is true not 
only insofar as "general background” is concerned but also in 
the student’s field of specialization. Second, it would be most 
undesirable for graduate schools to indicate that a given body 
of knowledge was a requirement for graduate work. This would 
have the evil effect of the graduate schools controlling the un¬ 
dergraduate curriculum in the same way as the colleges have 
often had the unfortunate role of controlling the curriculum of 
the secondary school. Third, even if a body of essential knowl¬ 
edge could be identified, there would be no up-to-date examina¬ 
tions for measuring the extent to which this knowledge had 
been acquired. 

For these reasons, the assessment of the graduate student 
must be largely in terms of the extent to which he has mastery 
of the intellectual arts and skills necessary for success in gradu¬ 
ate work. Following the pattern of the American Council on 
Education Psychological Examination the present investigators 
devised a test for a higher level of ability which would yield a 



374 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

linguistic and quantitative score and which would measure proc¬ 
esses hypothesized to be important in graduate success. The 
test will be referred to as the Academic Aptitude Test, Graduate 
Level. 

The Nature of Test 

All items in the test were multiple-choice with five alterna¬ 
tives. The major portion of the test was liberally timed so as to 
eliminate the factor of speed. The items were grouped into five 
parts which are described below: 

Part I, Vocabulary .—Eighty words were selected from the 
technical terminology of the common areas of specialization of 
graduate work including Physical Sciences, Biological Sciences, 
Social Studies, Languages, Law, and Philosophy. 

Part II, Reading Comprehension .—This test is called a read¬ 
ing test only from custom. It requires the student to reason 
rather than to memorize what has been read. Questions of the 
following type which follow some of the passages indicate the 
kind of mental process which the test involves: 

Which one of the following individuals is most likely to have 

written the above passage? 

What is the most important practical oversight in the plan 

suggested? 

What is the main purpose of the author? 

What does the author mean by ‘regions larger than any em¬ 
pire of antiquity?’ 

Part III, Verbal Reasoning .—This test involves processes 
such as the identification of erroneous assumptions, inconsist¬ 
encies, justifiable in contrast to unjustifiable conclusions, and 
the making of inferences which are probably but not necessarily 
correct. 

Part IV, Quantitative Reasoning :—This test involves reason¬ 
ing with numbers, but does not involve mathematics much be¬ 
yond that taught in junior high school. A few, but not many, of 
the problems place a considerable emphasis on the ability to 
understand descriptions of complex data. 

Part V, Numerical Ingenuity .—In this test the examinee is 
presented with a series of numerical problems each of which 
can be solved by a short method or a long method. The examinee 



APTITUDE OT GRADUATE STUDENT 


375 


is instructed to look for the short method of solving each prob¬ 
lem. If he does not see the short method at once, he is to pass on 
to the next problem. This section of the test, unlike the other 
sections, emphasizes speed. 

The original plan of the test was to combine the scores on the 
first three parts in a verbal-factor score and to add together the 
scores on the last two parts to provide a numerical reasoning 
score. The basis for this partition of the test is found in the 
practice followed by the American Council on Education Psy¬ 
chological Examination and in the Differential Aptitude Battery 
both of which provide verbal and numerical scores which have 
been found to have differential predictive value. 

The reliabilities for the verbal and numerical scores calculated 
by means of the Kuder-Richardson (Formula 11) were found to 
be 0.86 and o,86 respectively. The reliability for the total test 
calculated on the same basis was 0.90. The correlation between 
the verbal and numerical sections was found to be o.zo. These 
correlations are based on 484 cases. 

The correlation between the verbal and numerical section is 
much lower than that found with the American Council on Edu¬ 
cation Psychological Examination. In the latter case the correla¬ 
tion between the quantitative and linguistic sections is probably 
raised considerably by the fact that both sections involve a 
speed factor. 

Validation Procedure 

During the academic year 1948^49 the test was administered 
to 1,111 graduate students. About half of these students were 
in their first year of graduate work and the remainder had been 
in graduate school for varying lengths of time, For the purposes 
of this study, only those students who were registered for 6 or 
more hours of courses for graduate credit during each semester 
were included in the investigation. In addition, it seemed de¬ 
sirable to eliminate those foreign students who had taken their 
undergraduate work in non-English speaking countries. These 
eliminations reduced to 484 the number of cases included in the 
study, and these cases were distributed over the various areas 
of graduate study in the manner shown in Table 1. 

Some of these groups are much too small for study; conse- 



376 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


quently, it seemed advisable to eliminate the biological science, 
library science and miscellaneous groups from further study. 

Correlation oj Test Scores With Grades 

Table 3 summarizes the correlation of test scores with aver¬ 
age grades. 

Certain facts emerge from this table which throw considerable 
light on problems of the selection and guidance of graduate 
students. First, the correlations of test scores with grades in 


TABLE 2 

Distribution oj Students by Field 


Field 


Social Studies. 

Physical Sciences. 

Engineering... 

Languages and Literature 

Education.... 

Biological Science. 

Library Science. 

Miscellaneous. 


No. in Each 
Field 


68 

86 

77 

88 

68 

28 

30 

39 


TABLE 3 

Correlations of Test Scores With Grades 


_ Fart _ 

I II III IV V Verbal Num. 

Nura. Score Score 

Verb. Num. Ince- Parts Parts Total 

N Vocab. Read, Reas. Reas, nuity 1,11,111 IV, V Score 


Social Studies. 68 .46 ,24 .09 .10 .05 .36 .09 .31 

Physical Science. 86 .08 .46 .31 .33 .09 .18 .27 .27 

Engineering. 77 .04 .10 .03 ,14 .01 .08 .10 .10 

Education. 08 .45 .42 .38 .16 .28 .49 .24 .47 

Lang, and Lit. 88 .41 .27 .34 .35 .24 .47 .37 .50 


engineering are negligible in magnitude. This is in accordance 
with expectations since the criterion represents for practical 
purposes an unpredictable variable. 

Second, there are great differences between the areas of study 
in the abilities associated with grades. In social studies and 
languages the part which has the highest predictive value in¬ 
volves vocabulary rather than reasoning. In the physical sci¬ 
ences, on the other hand, it is the sub-tests involving reasoning 
which have the highest correlation with grades. This does not 
mean that success in the physical sciences depends upon reason- 






















APTITUDE OF GRADUATE STUDENT 


377 


ing abilities while success in languages and social studies does 
not. However, it is consistent with the observation that exami¬ 
nations given in the physical sciences call for problem solving 
and reasoning abilities while those given in languages and the 
social sciences commonly call for memory of facts rather than 
thinking skill. It is probable that the achievement of under¬ 
standing in most fields involves reasoning, but in many fields 
grades are assigned on the basis of the amount of accumulated 
knowledge rather than on the basis of the amount of under¬ 
standing achieved. 


TABLE 4 

Multiple Correlation Between Test Stores and Average Grades 


Field 


R 


Social Studies. 

Physical Sciences. 

Engineering. 

Education. 

Languages and Literature. 


■49 
■S' 1 
■ 17 
•54 
• 5 ° 


TABLE 5 

Comparison with Miller Analogies Test 


Field 

Multiple correlation 
of Academic Apti¬ 
tude Test with 
average grade 

Correlation of Miller 
Analogies Test with 
average grade 

Social Studies. 

. 49 

.18 

Physical Sciences. 

. 5 i 

-38 

Engineering. 

.17 

.09 

Education. 

. 54 


Languages and Literature. 

. 5 o 

•34 


Third, the original hypothesis that a verbal and a numerical 
score represented useful measuring categories does not seem to 
be consistent with the data. Inspection indicates that differ¬ 
ential prediction would be much more effective if the sub-tests 
were grouped, not into numerical and verbal categories, but 
into reasoning and vocabulary categories. 

Fourth, since there is great variation in the extent to which 
each of the sub-tests predicts success in each of the areas of 
study, it would seem that scores on sub-tests should be differ¬ 
entially weighted before they are added together in order to 
maximize the accuracy of prediction for a given area of study. 




















378 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Correlations Between Weighted Scores and Average Grades 

Table 4 shows the correlations between a composite of sub¬ 
scores weighted to give maximum predictions and average 
grades. These multiple correlations are, with the exception of 
the one for engineering, of sufficient magnitude to justify the 
use of the test as one aspect of the assessment and guidance of 
the graduate student. 

It is interesting to compare the above correlations with those 
found with the Miller Analogies Test using a similar criterion of 
average grade over a year’s work in the Horace H. Rackham 
School of Graduate Studies. This comparison is shown in Table 


TABLE 6 

Beta Weights for Parts of Test 





















APTITUDE OT GRADUATE STUDENT 


379 


grades. These weights indicate the contribution which each 
part makes independent of all other parts. They are reported 
in Table 6. 

This table again suggests rather strongly that in the social 
studies, education, and languages, grades are based on different 
criteria than they are in the physical sciences. It indicates also 
that a test of aptitude for graduate students would be better 
structured by having a vocabulary and a reasoning section 
rather than a verbal and a numerical section. In some ways this 
is rather surprising since the intercorrelations of the sub-tests 
in the battery show that, in general, the sub-tests cluster into 
the verbal and numerical categories. This is illustrated in Table 
7 which shows the interrelationships between the sub-tests cal¬ 
culated on the basis of 484 cases. 

Summary and Conclusions 

The present study was concerned with the assessment of the 
student’s aptitude for graduate work. It was demonstrated that 
if success in graduate work is measured by grades then it is pos¬ 
sible only in certain fields to make predictions of success. In the 
present case, grades in engineering lacked homogeniety from 
one semester to the next and did not constitute a reasonably 
predictable criterion. This statement should not be generalized 
to imply that in other graduate schools grades in engineering 
would lack consistency from one semester to the next. However, 
it does imply that graduate schools should from time to time 
check on the stability of average grades since they are a basis 
for awarding degrees. If grades are unstable from one semester 
to the next then any degree awarded on the basis of them is 
awarded arbitrarily. 

The predictive value of a test designed to give a verbal-ability 
and a numerical-ability score was studied. It was found, how¬ 
ever, that the test did not give best predictions when this type 
of partitioning was used. The evidence indicates that it would 
be better to partition the test into a vocabulary and a reasoning 
section and to weight these parts differentially for making pre¬ 
dictions in various fields. This finding is important in view of 
the fact that two of the major testing organizations have an¬ 
nounced plans for providing a verbal ability and numerical 
ability test for the same level of difficulty as the present test. 



MEASURING ORIGINALITY IN THE 
PHYSICAL SCIENCES 1 

MILTON M. MANDELL 

Examining and Placement Division, United States Civil Service Commission 

The United States Civil Service Commission started in Oc¬ 
tober, 1947, a study of selection methods for physicists, 
chemists, and engineers. The following report is an interim one 
which describes the selection methods which seem to predict 
best the ability to perform research work in the physical 
sciences, based on a try-out of tests on more than 600 chemists, 
physicists, and engineers. 

The data below are presented in three forms. In the first 
place, there are presented correlations between test scores and 
ratings by colleagues and supervisors on a five-point graphic 
rating scale on an item described as: “Originality of thinking— 
what is his ability in creative thinking? How original is he in 
his approach to problems when originality is necessary?’' The 
second method used was to identify those scientists who were 
engaged in basic research work; this was done in order to 
determine the correlation between test scores and job per¬ 
formance on the basis of an over-all evaluation on a graphic 
rating scale by colleagues and supervisors. The third method 
was to determine the significance of the difference between the 
mean scores of research personnel and those of non-research 
personnel on the tests used. 

Where the criterion was the summation of ratings of col¬ 
leagues and supervisors, the method was to add together the 
ratings by all colleagues and supervisors and to divide these 
ratings by the total number of ratings, obtaining an average 
unweighted score. 

'■This study was carried on by the Civil Service Commission as part of its regular 
program for the improvement of selection methods. Part of the work that was done on 
this project was performed by persons employed by the American Council on Educa¬ 
tion m its con tract with the Scientific Personnel Division of the Office of Naval Re¬ 
search, Neither organization assumes any responsibility for the contents of this report. 

380 



MEASURING ORIGINALITY 38 1 

A large number of tests were included in this study. These 
tests are: 

(1) figure analogies (abstract reasoning) 

(2) Gottschaldt figures 

(3) spatial relations tests, including cube-turning, surface de¬ 
velopment, and a test developed by a member of the 
Civil Service Commission which is similar to the block¬ 
building test 

(4) formulation 

(5) letter series 

(6) table reading 

(7) vocabulary 

(8) interpretation of data 

(9) hypotheses 

(10) scrambled sentences 

(11) subject matter 2 

Statistical data are not furnished in this report for many of 
these tests. In most cases, the reason for the omission of these 
data is that the correlations were not computed; the scatterplots 
indicated no significant correlations between the test scores and 
the criteria. Where the correlations were computed, they were 
not significantly different from zero. 

As will be noted below, many of these tests are quite brief in 
terms of number of items. This is considered a preliminary 
study and it was thought advisable to try out a large number 
of item types. Because of the short testing time available, it 
was necessary to abbreviate these tests, in some cases probably 
to a level below that needed for obtaining significant data on 
their value or lack of value. 

1. Relationship of test scores to ratings on originality , 3 —Sub¬ 
ject-matter tests produced significant correlations with ratings 
on originality. For example, for 35 physicists at the National 
Bureau of Standards at grades P-i through P-7, the correlation 
with a test of approximately 100 items in the basic field of 
physics was +.59. For 58 chemists at the Eastern Regional 
Research Laboratory of the Department of Agriculture in 
grades P-i through P-4, the correlation with a basic test in 

1 These tests are described in an article, "Selection of Physical Scientists,” by 
Milton M. Mandell and Sidney Adams, Educational and Psychological Measure¬ 
ment, VIII (1948) 575-582, 

3 All correlations included in this report are Pearson product-moment correlations 
unless otherwise noted. 



38a EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

chemistry of approximately 100 items was -{-.46, For 17 chem¬ 
ists at the Bureau of Standards in grades P-i through P-6, the 
correlation was the same, +.46. For 53 cases at the Western 
Regional Laboratory of the Department of Agriculture, the 
tetrachoric correlation was +.46 for the chemistry test, with 
the sample including chemists at P-i through P-4. For 19 
electronics engineers at the Naval Electronics Laboratory in 
grade P-a, the correlation between the same basic physics test 
that was given to physicists and ratings on originality was 
+ .58. 

Ill addition to the subject-matter test, other tests produced 
interesting results. For a test of approximately 35 items pre¬ 
pared by Professor Max Engelhart of the Chicago City Junior 
Colleges on the ability to evaluate hypotheses, the correlation 
with ratings on originality for 31 chemists at the Bureau of 
Standards in P-i through P-6 was +.49. This result did not 
stand up with the Eastern Regional sample of chemists; how¬ 
ever, the correlation for this test at the Eastern Regional 
Laboratory for 45 chemists engaged in basic research work, 
when over-all ability in basic research was the criterion, was 
+- 44 - 

In addition to the subject-matter and hypotheses tests de¬ 
scribed above, a test in basic college mathematics of approxi¬ 
mately 30 items correlated +.41 for 62 physicists at the Bureau 
of Standards, with ratings on originality being used as the 
criterion. 

2. Critical ratios between research and non-research groups .—A 
number of tests provided significant differences between the 
mean scores of those engaged in research work, either basic or 
applied, and of those engaged in auxiliary work in the sciences, 
such as testing. 

The formulation test, in the form administered at the Bureau 
of Standards, consisted of 15 items which involved the ability 
to translate a narrative statement into an algebraic equivalent. 
It produced significant differences at the 1 per cent level of 
confidence between 20 chemists engaged in research work and 
6 chemists not in research work. The mean score of the research 
workers was 8,9, and the mean score of the non-research workers 
was 5.5, 



MEASURING ORIGINALITY 


383 


A scrambled sentences test of seven items which involved the 
ability to determine what the last word of the sentence would 
be if the sentence were correctly arranged also produced a 
significant difference at the 1 per cent level of confidence be¬ 
tween research and non-research chemists. There were 28 
chemists in the research group with a mean score of 2.8, and 
8 chemists in the non-research group with a mean score of 1.9. 

For physicists, the same formulation test described above 
also differentiated between research and non-research physicists 
at the 1 per cent level of confidence. The mean score of 2.3 
research physicists in the formulation test was 11.0, while the 
mean for 13 non-research physicists was 7.8. 

The table reading test of the Air Force was included in the 
battery of tests. This is essentially a test of carefulness, visual 
acuity, and attention to detail. For a group of 56 engineers, 17 
of whom were in research work and 39 of whom were in non¬ 
research work, this test, which takes about 7 minutes to ad¬ 
minister, produced a critical ratio significant at the 5 per cent 
level of confidence. The mean score for research engineers was 
17.9; the mean score for non-research engineers was 13.9. These 
engineers were in the electrical and mechanical fields at the 
Naval Ordnance Laboratory and were in grades P-i through 

P -3 

The same table reading test produced significant results on 
another population of engineers. This sample of 52 engineers in 
grade P-3 consisted of 29 research engineers and 23 non-research 
engineers. In this case, the test score was the number wrong 
rather than the number right. The mean score of the non¬ 
research engineers in terms of number wrong was higher than 
the mean score of the research engineers, with a difference 
significant at the 5 per cent level of confidence. The mean score 
for the research engineers was .57 wrong answers; the mean 
score of number wrong for the non-research engineers was 1.24 
answers. 

A similar analysis was based upon 87 engineers at the Naval 
Electronics Laboratory in grades P-i through P-4. Thirty-two 
of these engineers are in research work and 55 are in non¬ 
research work. Differences which were significant at the 1 per 
cent level of confidence, in favor of the research engineers, were 



384 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

obtained on the formulation test described above and on a 
vocabulary test. There was a difference of five points in the 
mean scores on these two tests in favor of the research engineers. 

A significant difference at the 1 per cent level of confidence 
was also obtained on a test of spatial relations in favor of the 
non-research engineers. This test in spatial relations was pre¬ 
pared by a member of the staff of the United States Civil 
Service Commission and is similar to the block-building test 
frequently used. The average score of the non-research en¬ 
gineer on this test was 16 points higher than the average score 
of the research engineer. 

3. Correlation ot test scores with ability in basic research .—It 
was possible to obtain a group of 45 chemists from the Eastern 
Regional Research Laboratory who were engaged in basic re¬ 
search work. These chemists were rated by colleagues and 
supervisors on a five-point graphic rating scale. The method 
for determining the rating on basic research ability was to add 
up the ratings on over-all ability and divide by the number of 
ratings. In addition to the results with the hypotheses test 
mentioned above, namely, a correlation of +.44, a correlation 
of +.61 was also obtained with the basic chemistry test de¬ 
scribed above for these 45 chemists in basic research. This was 
the only group in basic research sufficiently large in numbers 
to justify the isolation of the group for correlation purposes. A 
number of other tests were tried out with this group but none 
produced significant results. 

Summary 

1. The formulation test seems to have the widest usefulness 
in differentiating research from non-research personnel. Sig¬ 
nificant differences at the 1 per cent level were obtained with 
samples from the fields of physics, chemistry, and engineering. 

1. Subject-matter tests also provided pertinent data for 
physicists, chemists, and engineers, using ratings on originality 
as the criterion. 

3. The other tests produced significant results but their use¬ 
fulness was more limited. The mathematics test correlated sig¬ 
nificantly with ratings on originality for physicists; the 



MEASURING ORIGINALITY 


385 


scrambled sentences test differentiated between research and 
non-research chemists; the table reading, vocabulary, and a 
form of block-building test produced significant data for en¬ 
gineers, 

4. The results obtained for these tests from the various 
samples differed in a number of cases. The differences may be 
due to differences in the samples, the nature of the work, 
differences in criteria content, or reliability. 



PROBABILITY APPROACH TO FORECASTING 
UNIVERSITY SUCCESS WITH MEASURED 
GRADES AS THE CRITERION 


L, J. LINS 

University of Wisconsin 

For some time, emphasis has been placed upon the ability 
to forecast academic success in terms of grade-point averages 
at the University of Wisconsin. As early as 1909, Dearborn 1 
attempted to discover whether relative standings in the sec¬ 
ondary school were indicative of academic success at the Uni¬ 
versity. As time progressed, various persons investigated the 
possibilities of “predicting” grade-point averages through mul¬ 
tiple regression. 

In September, 192.8, 1687 University of Wisconsin freshmen 
took the American Council Psychological Examination. Seven 
hundred and fifty-six of these freshmen were selected as a 
sample, All were in attendance at the University for at least 
one year after taking the examination. American Council per¬ 
centiles (based upon national norms) and high-school percentile 
ranks were computed. Zero-order Pearson-Product-Moment 
Coefficients of Correlation were then calculated between these 
respective factors and grade-point averages for the freshman 
year. A multiple coefficient of correlation of .711 resulted. 2 Thus 
about 50 per cent of the variance of grade-point average was 
associated through regression with the two independent vari¬ 
ables named. This shows a substantial concomitant variation. 
Since no factors have been found which would forecast uni¬ 
versity success better, it was thought advisable to employ the 
American Council Psychological percentile and the high-school 
rank percentile in this study. 

The approach here employed is one of trying to set up a 

1 Gustav J, Froehlich, "The Prediction of Academic Success at the University of 
Wisconsin," The University of Wisconsin Bureau of Guidance and Records, Bulletin 
2J74, Series 2358, October 194I, p. 3. 

5 Ibid, 20-22. 


386 



FORECASTING UNIVERSITY SUCCESS 


387 


system of success or failure probabilities associated with bi¬ 
variate quarter ranges of the American Council Psychological 
and high-school rank percentiles. The sample consists of 1789 
freshmen, 1189 men and 600 women, who entered the Uni¬ 
versity of Wisconsin, First Semester, 1948-49. All are residents 
of Wisconsin and were graduated from Wisconsin high schools. 
Nine per cent were graduated with secondary classes of 30 or 
less students, 13 per cent with classes of 31 to 60, 27 per cent 
with classes of 61 to 150, and 51 per cent with classes of 151 or 
over. 

The American Council Psychological Examination , 1947 edi¬ 
tion (local norms), and high-school rank percentiles were com¬ 
puted by the Student Counseling Center, University of Wis- 

TABLE 1 


Mean, Standard Deviation of the Distribution , Standard Error of the Mean, and Critical 
Ratio of the Difference Between Means for the Samples Used 




Men 



Women 


C. R- 
of Dill, 
of Mcanj 

Variable 

M 

Oil 


M 



Grade-Paint Aver- 

Hign-School Rank 
Percentile. 

1.178 

-841 

,026 

I.409 

.810 

■°3S 

S-33 

64.936 

25.5a 

•793 

77-101 

21,30 

.920 

10,02 

American Council 
Psychological 
Percentile. 

5^-315 

28.55 

.876 

47.800 

27.H9 

1.169 

3-°9 


consin. First-semester.grade-point averages at the University 
and the above-mentioned percentiles were recorded. The sub¬ 
jects were divided into two groups by sex in an attempt to 
discover whether or not the men and women differed signifi¬ 
cantly in the factors under consideration. If significant dif¬ 
ferences were found, it would indicate that, for the type of 
approach herein described, it would be better to forecast uni¬ 
versity success separately by sex rather than by using the whole 
group without reference to sex. 

The groups are described in Table 1. It is seen that the 
freshman women maintained a higher mean grade-point average 
than the men in their first semester at the University of Wis¬ 
consin and ranked higher on the average in the high-school 
glasses with which they were graduated. However, the mean 








388 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


percentile rank of the men on the American Council Psycho¬ 
logical Examination was higher than that of the women. 

Again referring to Table i, one notes that the means of the 
men and women on the three variables differ significantly. This 
would indicate that there is cause for keeping the samples of 
men and women separate. 

A significant association is found between grade-point aver¬ 
age and high-school rank and the results of the American Coun¬ 
cil Psychological Examination respectively using the Pearson- 
Product-Moment method of correlation. The correlation 
coefficients together with the critical ratios are presented in 
Table a. 

In setting up the system of success or failure probabilities, 
each of the two groups, that is men and women, was then 
subdivided into 16 bi-quarter categories according to percentile 

TABLE 2 


Coefficients of Correlation with the Dependent Variable of First-Semester Grade-Point 
Average Together with the Critical Ratio of the Coefficient * 




Men 

Women 

Variable 

X 

C.R.r 

r 

C.R.r 

High School Rank Percentile. 

American Council Psychological Percentile.... 

:$ 

18.73 

IJ- 4 1 

.623 

.489 

14.48 

11 .36 


r i 

* C.R. r = — where a r — 7 = 

ov v N — 1 


rank on the American Council Psychological Examination and 
in high-school class. The resulting “cells” were composed of 
individuals who had approximately the same percentile ranks. 
For example, all individuals who ranked between the first and 
the twenty-fifth percentile on both factors would be in the 
same “cell.” 

Each “cell” was then divided according to grade-point aver¬ 
ages of the individuals within the “cell.” In computing grade- 
point averages at the University three grade points are assigned 
for each credit at grade of A, two for each grade of B, one for 
each grade of C, zero for each grade of D, minus one-half for 
each condition grade, and minus one for each grade of failure. 
Averages as computed followed this pattern. Therefore a B 
average was considered as 2.00-2.99, C as 1.00-1.99, D as 0.00- 
0.99, and Fail as — i.oo-(— 0.01). Frequency distributions were 
then set up for each “cell” and percentages based upon the 






forecasting university success 


389 


total individuals in the “cell” computed. In addition, since a 
grade-point average of 1,00 is necessary for satisfactory progress 
in the University, all grade-point averages above 1.00 were 
considered as successful. As presented in Table 3 and Table 4, 
this gave the probability of success of entering freshmen. 

TABLE 3 

Probability of Academic Success oj New Male Freshmen Based TJpon High-School 
Percentile Rank and Percentile Rank American 
Council Psychological Examination* 





American Council Psychological Percentile 


High-School 

Grade 








Rank Percentile 

Level 

0-24 

25-49 

50-74 

75-100 


B 

C 

M 

49 

63 

19 -K 

56 75 

32 

51 

83 

45 

45 

9 ° 




(49) 

(■ 07 ) 

(136) 


(22.8) 

75-100 - 

D 

Fail 

33 

4 

37 

V “ 

IS 

2 

17 

9 

1 

IO 


B 

C 

6 

40 

46 

50 SS 

14 

52 

66 

h 

55 

69 




( 85 ) 

(83) 

(90 

( 65 ) 

50-74 - 

D 

Fail 

46 

8 

54 

45 

17 

7 

34 

18 

3 

3 i 


B 

c 

1 

28 

19 

33 33 

8 

40 

48 

O 

64 

64 




(80) 

(66) 

(48) 

(22) 

15-49 - 

D 

Fail 

l6 

7 i 

16 67 

44 

8 

52 

18 

18 

36 


B 

0 


6 

O 


18 



C 

17 

17 

14 30 

47 

47 

14 

42 



(6°)t 

( 33 ) 

(19) 

(17) 

O-24 

D 

Fail 

JO 

33 

83 

40 7 ° 

30 ' 

2,1 

53 

35 

23 

58 


* Probability of success is based upon experience with first-semester freshmen 
1948-49 who were graduates of Wisconsin High Schools. The interpretation might 
be as follows: 

It has been our experience that 83 per cent of the men ranking below the twenty- 
fifth percentile on the American Council Psychological Examination (local norms) 
and in high-school class were not successful as first-semester freshmen. 

f The number in parentheses is the size of the sample. Numbers above the broken 
line are probabilities of receiving a C or B or better average. The sum of these two is 
the probability of success. 


In addition to the interpretation as presented in the footnotes 
of Tables 3 and 4, it seemed desirable to determine a point at 
which the probability of success would be equal to, or greater 
than, the probability of failure. Integral values from one to 
four were assigned to the percentile divisions by quarters of the 












390 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


American Council Psychological and high-school rank. Thus the 
quarter 0-24.9 has a value of one, 25-49.9 a value of two, 
50-74.9 a value of three, and 75-99.9 a value of four. It is 
interesting to note for the men, excluding the lowest quarter 


TABLE 4 

Probability of Academic Success of New Female Freshmen Based upon High-School 
Percentile Rank and Percentile Rank American Council Psychological Examination* 






American Council Psychological Percentile 


High School 

Giade 









Rank Percentile 

Level 


0-24 


25-49 


so-74 

75-100 


B 

C 

54 

68 

19 

6 l 

80 

37 

53 

90 

60 

34 

94 




(S 3 ) 


(107) 

(106) 

(115) 

0 

O 

T 

0 

D 

Fail 

- “ 

— - _ 

— — 

— — 

- ~ 

— — — 

■ — 


27 

5 

32 

20 

O 

20 

9 

1 

10 

6 

0 

6 


B 

C 

2 

38 

40 

8 

44 

5a 

13 

52 

65 

22 

50 

72 




( 47 ) 


(54) 


(23) 

(14) 

So- 74 

D 

Fail 

— 1 - 

- - - 

— - 

— — 

— - 

— — — 

• — 

- •- 


49 

31 

60 

39 

9 

48 

35 

0 

35 

14 

14 

28 


B 

0 


6 

5 ° 

0 

29 




C 

44 

44 

44 

29 

0 

O 




U 5 ) 

(16) 

(7) 


(0) 

25-49 

D 

Fail 

36 

20 

56 

I r^<~) 

1 O M 

1 

5 ° 

7 1 

0 

71 

0 

0 

O 


B 

C 

O 

31 

3 i 

0 

33 

33 

25 

25 

5 ° 

0 

0 

O 




G 6 )t 


( 3 ) 

(4) 


(0) 

0-24 

D 

Fail 

38 

31 

69 

67 

0 

67 

25 

25 

5° 

0 

0 

O 


* Probability of success Is based upon experience with first-semester freshmen 
1948-49 who were graduates of Wisconsin High Schools. The interpretation might be 
as follows: 

It has been our experience that 60 per cent of the women ranking below the twenty- 
fifth percentile on the American Council Psychological Examination (local norms) 
and between the fiftieth and seventy-fifth percentile in high-school class were not 
successful as first-semester freshmen. 

| The number in parentheses is the size of the sample. Numbers above the broken 
line are probabilities of receiving a C or B or better average. The sum of these two is 
the probability of success, 

in high-school rank, that if the quarter value of high-school 
rank is added to the quarter value of the American Council 
Psychological, generally speaking, a sum of live or more indi¬ 
cates a 50-50 or greater chance of academic success. A sum of 
six or more indicates at least a 64-36 chance of success and a 
sum of seven at least 69-31. 









FORECASTING UNIVERSITY SUCCESS 39I 

The same generally holds true for the sample of women jf all 
"cells” below the fiftieth percentile in high-school rank are 
eliminated. The exceptions are "cells” 1-3 and 2-2, high-school 
rank being given first. This may be due to the small frequency 
and consequent inadequacy of sampling in the lower ranges. 

Since percentile rank in high-school class is directly affected 
by size of class, it is assumed that any forecasts for persons 
graduated with small classes will not be particularly valid. It 
was thought advisable, therefore, to either eliminate graduates 
of small high schools or to arrive at separate success proba¬ 
bilities for this group. 

In applying the regression equation in use at the University 
of Wisconsin for forecasting grade-point averages, it was found 
that a difference in percentile rank of one would not affect the 
forecasted grade-point average by more than 0.05 grade point 
where the size of class is 30 or above. A graduating class of 30 
was then selected as the division point between the small and 
large high school. In applying the same procedure for success 
probabilities as outlined for the whole group, it was found that 
eliminating the graduates of small high schools did not affect 
the probabilities previously reported. It was impossible to arrive 
at any accurate success probabilities for the small class because 
of limited size of sampling. Thus the probabilities of the group 
of less than 30 in graduating class and the group from classes 
of 31 or more students are not reported here. 

It would seem from the results presented that success proba¬ 
bilities could be very beneficial in the educational guidance 
program both before entering the University and during Fresh¬ 
man Orientation Week. Power of discrimination seems evident. 
With larger samples and differentiation by colleges, the proba¬ 
bility forecast might well take the place of the grade-point 
average forecast. It might also be more readily understood by 
the prospective student. Rough measures have been used. It is 
the feeling of the writer that the results have been interpreted 
previously as if these rough measures were precision instru¬ 
ments. Therefore possibly too much emphasis has been placed 
earlier upon the small differences between forecasted grade- 
point averages. 



PREFERENCES AND BEHAVIOR RATINGS OF 
DOMINANCE 


WILLIAM R. BIRGE 
Rensselaer Polytechnic Institute 

It is well recognized that there is not a necessary corres¬ 
pondence between a person's conduct and his report of his 
conduct. This situation is generally acknowledged in the field 
of interest and personality measurement. Meehl and Hathaway 
(a) have observed that whether or not a person reports his 
conduct accurately on a questionnaire, his answers may still 
constitute a significant aspect of his behavior. Kuder (i) points 
out that there is no necessary relation between scales on his 
Preference Record—Personal and the corresponding areas of 
actual behavior, but he believes that the use of a number of 
relatively independent scales is a promising starting point for 
prediction studies. 

This paper, however, is concerned with the question 
of whether there is a correspondence between conduct and 
verbal report. The criterion of behavioral ratings was used as 
the measure of conduct, while the verbal responses were ob¬ 
tained through the use of the Kuder Preference Record—Per¬ 
sonal. 

In connection with another study, the writer obtained soci¬ 
ometric ratings on the trait of dominance from the members 
of eleven fraternity groups, three sorority groups and two 
female dormitory groups. Dominant individuals were defined 
as those who "show the greatest assertiveness and ability to 
influence others in group situations.” The ratings were made 
on a total of 827 subjects. 

With the exception of three small fraternities, the four mem¬ 
bers from each group who received the highest ratings on 
dominance and the four members who received the lowest 
ratings were selected for further study. From the three small 
fraternities, only two members from each extreme of the domi- 

392 



RATINGS OF DOMINANCE 


393 


nance ratings were selected. Since, in two groups, three in¬ 
dividuals tied for the third from the lowest ratings, there were 
58 subjects in the high dominant extreme and 60 subjects in 
the low dominant extreme. 

All of these subjects were requested to fill out the Kuder 
Preference Record — Personal. The response was fairly good. 
Although 11 members of the high dominant group and 15 mem¬ 
bers of the low dominant group refused to cooperate in the 
study, there remained a pool of 92 records for analysis. Forty- 
seven of these records had been filled out by subjects who 
received the highest ratings for dominance, while 45 forms had 
been filled out by subjects who received the lowest ratings for 
dominance. 


TABLE 1 

The t's of the Differences Between the Mean Scores on Each of the Six Scales for the High 
and Low Dominant Groups 


Scale 

Mean Score (N ^ 47 ) 
High dominant group 

Mean Score (N - 45 ) 

Low dominant group 

td 

p 

A 

41.17 

38-04 

2.09 

.04 

B 

31.40 

30.96 

■71 

.48 

C 

37-°9 

33-44 

1.54 

.12 

D 

37-91 

40.76 

I.18 

■ 24 

E 

51-31 

47-16 

2.27 

.02 

H 

74.64 

81.87 

2.10 

.04 


The 92 records were scored for scales A, B, C, D, and E. 
The five areas of activity related to these scales are as follows: 

A. Preference for taking the lead and being in the center of 
activities involving people. 

B. Preference for dealing with concrete problems and every¬ 
day affairs rather than interest in imaginative activities. 

C. Preference for thinking, philosophizing, and speculating. 

D. Preference for pleasant and smooth personal relations 
which are free from conflict. 

E. Preference for activities involving the use of authority 
and power. 

In addition to the five regular scales, the records were also 
scored on the H scale, an experimental scale designed to 
measure the degree to which an individual deliberately tries to 
make a good impression on the test as a whole. It has been 





394 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

found that individuals who attempt to make a good impres¬ 
sion, rather than to answer sincerely, generally receive low H 
scores. A personal communication from Mrs. Phyllis Cram, of 
Sears, Roebuck and Co., suggested that the H scale might 
discriminate between dominant and non-dominant groups. 
Mrs. Cram tested several administrators with the Kuder Pref- 
erence Record — Personal, and found that the abler administra¬ 
tors tended to receive lower scores on the H scale than did the 
less able administrators. Mrs. Cram believes that, in this case, 
there should be no implication that the good administrators 
were insincere in their answers. She suggests the explanation 
that these people are “adept at creating a good impression. . . . 
They are playing their roles expertly, and an effective actor is 
always sincere even if it is a role.” 

After the records had been scored, the mean scores on each 
of the six scales were determined for the high and low dominant 
groups. The t’s of the differences between these means were 
then computed. The results of this analysis are presented in 
Table i. 

As indicated in this table, the differences between the high 
and low dominant groups on the three scales A, E, and H are 
significant at the 5 per cent level of confidence. (The H scale 
means of the two groups were, however, within the “honest” 
limits.) More specifically, in terms of expressed preferences, 
these results indicate that the highly dominant person tends to 
differ from the person with low dominance ratings as follows: 

(1) he prefers to take the lead and be in the center of ac¬ 
tivities involving people; 

(2) lie prefers activities involving the use of authority and 
power; 

(3) he prefers activities ordinarily chosen by people trying 
to make a good impression. 

REFERENCES 

1. Kuder, G. Frederic. Manual for Kuder Preference Record — Per¬ 

sonal. Chicago: Science Research Associates, 1949. 

2, Meehl, Paul E. and Hathaway, Starke R. “The K Factor as a 

Suppressor Variable in the Minnesota Multiphasic Person¬ 
ality Inventory.” Journal of Applied Psychology , XXX 
(1946), 525-564. 



REPRODUCIBLE SCALES AND THE ASSUMPTION 
OF NORMALITY 1 

ROBERT G. SMITH, Jr. 

University of Illinois 

The more commonly used statistical tests of hypotheses as¬ 
sume that the universe of values, as measured, is normally dis¬ 
tributed, In some instances, the distribution of scores which an 
investigator obtains from his sample gives him practically no 
confidence that this condition is met. That considerable thought 
has been given to this problem is clear from the recent review 
by Mueller (6) of numerical transformations, The purpose of the 
present paper is to examine some of the characteristics of the 
relatively new technique of reproducible scales from the stand¬ 
point of their use with statistics requiring normality assump¬ 
tions. 

The technique of “Scale Analysis,” originated by Guttman 
(a), has attracted considerable attention, since it promises to 
lead to the construction of tests which are unidimensional. 
Loevinger (4, 5), in the area of tests of ability, has dealt with 
the same problem in presenting techniques leading to the con¬ 
struction of “Homogeneous Tests,” as she prefers to call them. 

Tests are used for two major purposes: to order individuals 
in the characteristic being measured, and to test hypotheses 
concerning characteristics. The former may not involve the as¬ 
sumption of normality; the latter requires this assumption if the 
hypotheses are to be tested with statistical techniques such as 
the critical ratio, t, and analysis of variance. Some deviation 
from normality may not affect the precision of the statistical 
tests to any great degree. If, however, the user of reproducible 
scales has a distribution of scores which deviates strikingly from 
normality, then the principle to be described in this paper may 
permit him to approximate more closely a normal distribution. 

1 The writer wishes to express his appreciation to Dr. L. L. McQuitty for his critical 
comments on this paper, 


395 



396 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

That the assumption of population normality is not amiss in the 
case of many reproducible scales gains support from research 
in other forms of measurement. For instance, Thurstone (7, 8) 
assumed normality for the purpose of developing scaling tech¬ 
niques for tests of intelligence and for paired comparisons. He 
was then able to test this assumption experimentally. The test¬ 
ing of the assumption of normality in the area of reproducible 
scales is a topic for further research. 

The common feature of both the Guttman and Loevinger 
techniques is that they aim to construct tests whose items make 
perfect discriminations, In the case of a dichotomously scored 
item, no one who fails the item should make a higher total score 
than one who passes. With a multiple-response item such as is 
used in attitude scales, no one giving a lower weighted response 
should make a higher total score than one who gives a higher 
weighted response. While the major emphasis in the various 
techniques for “ Scale Analysis” has been in reproducing re¬ 
sponses to individual items from the total score, it is possible 
in a perfectly reproducible scale, since it gives perfect dis¬ 
criminations, to deduce the distribution of the total score from 
the number of individuals giving each response to each item, 
Table 1 shows how this can be done. 

This means that it will be possible, by the selection of items 
with properly located cutting points, and by the combination 
of categories in multiple-response items, to obtain a set of 
items which, when combined, give a normal distribution of the 
total score. Such a set of items is shown in Table 1. It will be 
noted that the characteristic of a perfectly reproducible scale 
which gives a normally distributed total score is that the scale 
makes relatively few discriminations between individuals in the 
center of the range of scores, and progressively more as the ex¬ 
tremes are approached, While it is, of course, unlikely that a 
perfect normal curve will appear in practice, we should be able 
to approximate normality. 

If a sufficiently large pool of items is available, the investi¬ 
gator may select the number of items he intends to use in the 
scale. Then, if he wants, say, eleven items with cutting points, 
at one-half sigma units apart, reference to the table of area 
under the normal curve will give him the proportions desired 



TABLE i 

Item Responses and Total Scores cf Perfectly Reproducible Test 

Test Score 


REPRODUCIBLE SCALES 


397 



Each X represents a% of the cases passing the item. Total scores were obtained by summing X’s by columns. 



398 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

in each item category. If he desires a different number of items, 
the same procedure may be followed. 2 

As Guttman (3) has pointed out, the rank order of an in¬ 
dividual with regard to a scalable universe of content remains 
invariant no matter which items are used. Therefore, the selec¬ 
tion of items to obtain a normally distributed total score will 
in no way affect the other valuable properties of the scale. In 
fact, the placing of restrictions on the location of the cutting 
points may lead to more efficient scales. 

Two scales may have equal reproducibility, but yet have 
different characteristics as regards the differentiation of in¬ 
dividuals. (Compare Tables 1 and a in this respect.) It is recog¬ 
nized that normal distributions may not be desirable in all pur¬ 
poses to which tests may be put. However, for a given purpose, 
if the distribution of cutting points be identical in two tests, 
equal reproducibility will mean equal efficiency. 

A recent use of analysis of variance with a reproducible scale 
is the study of Gage (1). He, recognizing that the data shown 
in his Table 15 did not form a normal distribution, was cautious 
in the interpretation of his results. However, if he had a larger 
pool of items from which to draw, the selection of items with 
properly located cutting points could have given him a normal 
distribution of scores. 

According to Guttman (2), one of the advantages of scaling 
theory is that it does away with “untested and unnecessary 
hypotheses about normal distributions.” Although normality 
assumptions are not required for scale analysis itself, it may be 
necessary in some of the uses to which scales are put. There¬ 
fore, it is desirable to have a principle to use in achieving normal 
distributions of total scores on reproducible scales and homo¬ 
geneous tests. 

REFERENCES 

1. Gage, N. L. Scaling and Factorial Design in Opinion Poll Analysis, 
Studies in Higher Education , No. 61. Lafayette, Ind.: Purdue 
Untv., 1947. 

1 After the present paper W3S completed, it was brought to the author’s attention 
that this technique had been previously described by G. Hausknecht, in an unpublished 
article, A Procedure for Determining a Useful Approximation to an Ideal Scale. Haus¬ 
knecht, however, does not bring out the importance of the distribution of cutting points 
in determining scale efficiency, 



REPRODUCIBLE SCALES 399 

i . Guttman, L. "A Basis for Scaling Qualitative Data.” American 
Sociological Revue, IX (1944), 139-150. 

3. Guttman, L. Questions and Answers About Scale Analysis. Report 

B-i, Research Branch, Information and Education Division, 
Army Service Forces. 

4. Loevinger, J. A Systematic Approach to the Construction and Evalua¬ 

tion oj Tests of Ability. Psychological Monograph, Vol, LXI, 
No. 4, 1947. 

5. Loevinger, J. “The Techniques of Homogeneous Tests Compared 

with Some Aspects of Scale Analysis and Factor Analysis.” 
Psychological Bulletin , LXV (1948), 507-529, 

6. Mueller, C. G. “Numerical Transformations in the Analysis of Exper¬ 

imental Data." Psychological Bulletin, XLVI (1949), 198-223. 

7. Thurstone, L. L. “Psychophysical Analysis.” American journal of 

Psychology, XXXIII (1927), 368-389. 

8. Thurstone, L. L, “The Unit of Measurement in Educational 

Scales.” Journal of Educational Psychology, XVIII (1927), 
505 - 514 - 



A FACTORIAL STUDY OF BELIEFS 1 


J. W. HOLLEY 
University of Southern California 
and 

CLAUDE E. BUXTON 
Yale University 

The use of tests of false beliefs is currently popular among 
teachers of beginning psychology. Such tests serve to stimulate 
the interest of students in the field of psychology and call atten¬ 
tion to many misconceptions and prejudices at the outset. The 
investigation reported in this paper is concerned with this type 
of test. Our task was to describe such false beliefs in terms of a 
limited number of underlying variables, obtained by the method 
of factor analysis. The results of such an investigation should be 
of value to the teacher of beginning psychology, for they ac¬ 
quaint him with the dimensions of misconception among stu¬ 
dents. To students of psychometric techniques, the method 
alone will be the object of concern. 

Statistical background oj the study. —In the factor-analysis ap¬ 
proach, the investigator may analyze a correlation matrix in 
which either items or individuals function as variables. The 
inter-individual correlation method, which has often been 
adopted in factorial investigations in aesthetics, is known as 
the “inverted” method of factor analysis. One reason for using 
it in this currently unstructured field is that there are so many 
available test items that a matrix of a corresponding order is 
impractical. This is also true in the domain of lay beliefs about 
behavior and about psychology. For this reason the“inverted” 
method of factor analysis, or “Q technique” was employed in 
our study. 

After solving a particular inverted factor-analysis problem, 
we have as a result, a matrix of common factor loadings. There 

1 The first author wishes to express his appreciation to Professor W. Stephenson, 
visiting professor at the University of Chicago, for advice regarding the Q technique, 
particularly in relation to the importance of item difficulty in this method. 

400 



FACTORIAL STUDY OF BELIEFS 


401 


is one row for each individual and one column for each common 
factor. The square of each common-factor loading indicates 
that portion of the total variance in any particular individual’s 
beliefs which can be attributed to a single factorial source. 

The sum of the common-factor variances for each individual 
is known as his communality. In this study we may regard this 
figure as an “index of agreement” between a particular in¬ 
dividual and the other individuals represented in the matrix. It 
indicates the extent to which his evaluations of statements of 
belief were determined, as were those of the other individuals. 
The remaining portion of variance (+1.00 minus the commu¬ 
nality value) could be analyzed further into specific and error 
variance. The specific variance (estimate of reliability for an 
individual minus his communality) could be interpreted as an 
“index of individuality” of beliefs, compared to those of other 
individuals in the investigation. The error variance ( + 1.00 
minus the reliability coefficient for the individual) would repre¬ 
sent the remaining portion of variance. In our investigation, 
however, the reliabilities for the various individuals were not 
obtained. Therefore, analysis beyond the determination of com¬ 
mon-factor variances making up the communality for each in¬ 
dividual was impossible. 

Procedure .—The second author has constructed, over a period 
of years, a 100 item true-false test of misconceptions 2 . The 
items retained, for successive editions, were those which showed 
some biserial correlation with the total score on the test, were 
not passed or failed by all subjects, and were worded so that as 
many correct responses were true as were false. Thus, the typi¬ 
cal method of finding all of the items students would fail was 
not used to build this test. This questionnaire is currently used 
in the beginning classes on the Evanston campus of North¬ 
western University. 

From a group of 500 test papers, secured in the fall of 1948, 
30 were randomly selected. From these 30, 20 papers were 
finally selected to function as the basic variables of the correla¬ 
tion matrix. (The 10 papers which were eliminated were those 


1 Some of the items were taken from Valentine (j), some from Garrett.and Fisher 
(i), and some were obtained personally from C.d’A, Gerken of the University of Iowa, 
A mimeographed copy of this test may be obtained by writing to the second author. 



40 a EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





















FACTORIAL STUDY OF BELIEFS 403 

with the most extreme scores, i.e., those individuals who either 
answered almost all or very few of the items correctly. The 
reason for this final selection was that we wished to avoid cell 
entries, in the tetrachoric correlations, which were close to 
zero.) 

A matrix of 20 variables was thus obtained from the tetra¬ 
choric correlations of the scores of each individual with each of 
the other nineteen individuals. This correlation matrix, with 
individuals as variables, is presented in Table 1. 


TABLE 1 

Centroid Factor Loadings (and Communaiities) 


Individuals 

Factor I 

Factor 11 

Factor III 

Factor IV 

Communaiities 

I 

.58a 

-.383 

— .029 

-.318 

.389 

1 

■497 

.120 

-■378 

-■333 

■317 

3 

.726 

■197 

.084 

.213 

.619 

4 

.60s 

■ 338 

— . ^00 

-.149 

.736 

5 

■ 33 1 

- .238 

-.268 

.294 

■338 

6 

■ 655 

- -494 

— .261 

.123 

.736 

7 

.646 

■351 

.308 

.274 

.710 

8 

-386 

— . 2.67 

.076 

-■474 

.431 

9 

.4I6 

.286 

.419 

.063 

■435 

10 

.419 

.248 

-.279 

— .250 

■377 

11 

■394 

-■433 

.319 

— .I48 

■333 

12 

■396 

-■334 

.167 

■234 

■ 35 1 

13 

■ 598 

- .132 

—. 108 

■113 

.402 

14 

.669 

.216 

. no 

.023 

■ 3°7 

15 

.519 

- .103 

—.106 

.267 

■373 

l6 

.488 

■ I, 7 

-.068 

.146 

.278 

17 

■313 

-■ 3°3 

.312 

— .189 

■32 4 

18 

■943 

— .091 

—. 103 

.089 

.916 

19 

.521 

■341 

.097 

.271 

•471 

20 

.496 

.368 

.207 

— .271 

■498 


From this correlation matrix, four centroid factors (see Table 
2) were extracted according to Thurstone’s centroid method of 
factor analysis (2). The reference axes were then rotated follow¬ 
ing Zimmerman’s graphic method (6), so as to minimize the 
number of zero loadings. The rotations are presented in Table 
3, while the final rotated factor loadings are presented in Table 

4 - 

Interpretation offactors ,—An important problem in the use of 
the"Q technique” is determining the meaning of the extracted 
factors. The rotated factor loadings, by themselves, tell us very 
little about the nature of the factors, for they merely indicate 
the rank order of the individuals in regard to these dimensions. 









4O4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Such a rank order leaves much, to be desired in clarifying such 
meanings, 

Stephenson (3) attempted to solve this problem in the case of 
aesthetic judgments by interrogating the subjects in regard to 
their preferences and by observing the judgments of those in¬ 
dividuals who were highly saturated in one factor. Guilford and 
Holley (2) employed a system of weighted judgments. They 


TABLE 3 

Rotation 0/ Centroid Axes 


Axes 


Degrees 

Direction 


1 & 11 


46 

coun ter-clockwise 


I' Si IV 


s° 

con n ter-clockwise 


I” Si HI 


57 

counter-clockwise 


I'" Stir 


a 7 

clockwise 




TABLE 4 





Rotated Factor Loadings 



Individuals 

Factor I 

"Factor II 

Factor III Factor IV 

I 

.421 

—.o 44 

■ 556 

.318 

2 

.687 

.145 

.149 - 

.OI7 

3 

.270 

.002 

.103 

.415 

4 

.790 

■ 35 1 

— .080 

.O4I 

5 

.260 

-■°35 

— .III 

,507 

6 

.461 

— .Ogi 

.216 

.699 

7 

.052 

■ 77 o 

.098 

.327 

3 

‘ 3 °S 

-■OS 1 

■ 594 

.047 

9 

— .08J 

.002 

.231 

.106 

lo 

■551 

.249 

.070 — 

.074 

II 

.009 

■ 133 

■ 5°7 

.244 

12 

— .027 

■073 

. 218 

• 545 

13 

■ 346 

. 202 

.136 

■ 47 i 

1+ 

.292 

■ 559 

.212 

.252 

15 

■ a 57 

.216 

.007 

.510 

l6 

.272 

•346 

.007 

.289 

17 

— .022 

, 02 b 

.526 

.213 

18 

■547 

-411 

■275 

.609 

19 

.142 

.614 

-.059 

.264 

20 

.249 

.561 

•330 

■ 113 


obtained the product of the factor loading of the individual by 
the rating given by the individual to a particular object. From 
these scores for the various objects in the aesthetics study re¬ 
ported by these investigators, it was possible to arrange the 
objects according to the magnitude of these scores for the vari¬ 
ous factors, and, thus, to name the underlying variables. 

In order to determine the nature of the factors extracted in 
our investigation, biserial correlations were obtained, for each 







FACTORIAL STUDY OF BELIEFS 


405 


item, between whether or not the items were passed by the vari¬ 
ous individuals, and the factor loadings of these individuals. A 
perfect correlation, then, would be one in which all individuals 
who missed a particular item also obtained the highest factor 
loadings. A perfect correlation of an item with the loadings of a 
particular factor would mean that it measured individual dif¬ 
ferences maximally in regard to that particular dimension 3 . 
That is, it would differentiate, most efficiently, those individuals 
with high factor loadings from those with low factor loadings. 
Groups of items which differentiated maximally were used as 
the basis for naming the factors; that is, we selected clusters of 
items with the highest correlations and observed the common 
element among them. 

Identification of factors .—In the description of the items be¬ 
low, a positive correlation indicates that the item tended to be 
passed by those individuals with low factor loadings but missed 
by those individuals with high factor loadings. In the case of 
negative correlations, the converse is true. Since our factors 
represent areas of misconception, the positively correlated items 
are most useful as descriptive of false belief, while the negatively 
correlated items are most useful when contrasted to these. As a 
convenience, the biserial correlation of an item, together with 
its scoring key and its level of difficulty as indicated by the 
number of individuals missing the item, will be presented for 
each statement which is quoted. 

Factor I .—This seems to be a factor of general psychological 
na'ivite. It indicates a lack of technical knowledge about psy¬ 
chology. Those items which describe the factor most clearly are: 

“The printing on this page is upside-down on your retina.” 
(true) r = +.88 (5 missed) 

3 In the Q technique, the factor loadihg of an individual does not represent the 
amount of a certain factor present, if the concept of “amount" is defined in terms of 
expected scores from factorially pure tests. The reason for this is that the Q technique 
assumes that the means (in this case of misconceptions) are equal to zero and that the 
variabilities are equal to one for all individuals. The squared factor loading in the Q 
technique represents that portion of the variance in the misconceptions which the in¬ 
dividual has which correlates with the various factors. If then, we wanted to know “how 
much” (as defined by the subsequent scores on a hypothetical “pure factor” test of 
this dimension), we would have to adjust the squared factor loadings for the amount 
of misconception, as indicated by their total scores, and for the variability of the in¬ 
dividual’s scores. This adjustment of variances was not carried out in this particular 
study, although it should be in subsequent studies of this type. It was felt that the in¬ 
dividual differences in the means and variabilities were not sufficiendy great to neces¬ 
sitate a reworking of the data. 



406 educational and psychological measurement 


“Rats, cats, and dogs have the power to reason.” (true) r = 
+ .84 (15 missed) 

"There is little that psychology can do for the normal person.” 
(false) r = +.81 (2, missed) 

Factor II .—This seems to be a "knowledge of special termi¬ 
nology” factor. The items which were missed on this factor con¬ 
tained terms whose meanings are not clear to the layman. Items 
with high correlations are: 

"Half the people in this country are below average in intelli¬ 
gence.” (true) r = -(-.98 (9 missed) 

"The unconscious mind is located just above the roof of the 
mouth, directly back of the nose.” (false) r = +.82 (6 missed) 

It will be noticed that both of these statements require special 
knowledge about the terms contained in them. The terms “ aver¬ 
age in intelligence” and "unconscious mind,” while familiar to 
psychologists, do involve a terminology above the level to be 
expected of the layman. 

In contrast to these are the two statements which have the 
highest negative correlations: 

"Cats can see in complete darkness.” (false) r = —.81 (6 
missed) 

"A dog can sense impending disaster better than a man.” 
(false) r = —.81 (14 missed) 

While these last two statements do require a kind of special 
knowledge, namely that pertaining to perception, there is no 
problem of terminology here. 

Factor III .—This factor appears to be the clearest of the four. 
It has been labelled "conventional morality.” The items with 
the highest positive correlations are: 

"The majority of adult criminals are feeble-minded or very 
nearly so.” (false) r = +.88 (4 missed) 

"A child is born with a sense of good and evil—this is his con¬ 
science.” (false) r = +.80 (6 missed) 

"Being spanked may be pleasurable to a child.”-(true) r = 
+ .80 (7 missed) 

“A person who won't look you in the eye is probably un¬ 
trustworthy.” (false) r = +.78 (1 missed) 

Individuals high in this factor seem to have misconceptions 
about good and evil. They seem to look upon the conscience as 



FACTORIAL STUDY OF BELIEFS 


407 


something which is inborn. They appear to regard “bad” be¬ 
havior as being more modifiable through the “will” and intel¬ 
lectual choice of the individual than factual evidence would 
justify. 

Factor IV .—This seems to indicate an “over-evaluation of 
learning ability,” particularly of children. Items with the high¬ 
est positive correlation are as follows: 

“Children memorize much more easily than adults.” (false) 
r = +.98 (14 missed) 

“The average infant would learn to walk two months earlier 
than he does, if he were given the proper training.” (false) 
r = +.91 (12. missed) 

“The sense organs of touch, in a person with normal vision, 
are just as sensitive as those in a blind person.” (true) r = +.72 
(14 missed) 

It is interesting to note that the item “It is probable that 
man’s instinct to fight is the fundamental cause of wars." 
(false) had a correlation of —.71. (5 missed) 

Thus, it is possible to determine the factors of a given area 
and to carry out item analyses for hundreds of items from data 
from only a relatively few subjects. We may know how well 
each item measures each factor, as well as the level of difficulty 
of each item. For these reasons, this technique is particularly 
recommended for use in relatively unexplored areas such as 
aesthetics and ethics, where the investigator is faced with the 
problem of establishing the principal dimensions from an almost 
infinitely large number of items. To construct tests in an un¬ 
explored domain is costly and time consuming, particularly 
when the investigator does not know which items to start with. 
In the method suggested in this paper the investigator starts 
with every kind of item which he thinks might measure some¬ 
thing within the domain being considered. The results give him 
a rough idea of what the basic dimensions are. He also knows 
what groups of items are the best measures of these dimensions. 
He may then start a further analysis of the area, building his 
tests in the direction of the clusters and using the clusters of 
selected items as the basis for the selection of similar kinds of 
items. 

To demonstrate this method of screening, eight items were 



408 educational and psychological measurement 


selected from the original ioo. Each factor was represented by- 
two items. Each of these items had a high correlation with the 
factor it represented, but low correlations with the other factors. 
Each of these items was correlated with the other items, using 
tetrachoric coefficients on the basis of the twenty individuals’ 
scores who constituted the twenty variables of the original 
matrix. This matrix, in which the items are the variables, is pre¬ 
sented in Table The variables are grouped according to the 


TABLE 5 

Intercorrelations of Items 


-- 

— 



Items 




(Factor III) 

(Factor IV) (Factor II) 

(Factor I) 



1 

2 

3 4 3 6 

7 8 

(III) 

1 

+-75 




3 

-.08 

+.10 



(IV) 

4 

— .22 

-.18 

+ ■55 


V 

— .60 

+.08 

-■45 -.13 


(II) 

6 

-■35 

— .10 

-•7 5 -.11 +.75 


7 

-.68 

-■41 

-.35 .00 -.16 —.36 


(I) 

S 

— , 21 

.OO 

+ .60 +.40 -.30 —.81 

+ .58 





TABLE 6 




Centroid Factor Loadings ( For Items') 




Item. 


Factor I Factor II 



l 

.66? 

-■753 

1 

■ 394 

.6O4 

3 

.818 

■ 336 

4 

•*55 

■ 3 j 6 

i 

“■545 

- . I64 

-.847 

-.486 

7 

— . 212 

.678 

8 

.491 

■695 


factors which they represent. It is interesting to note at this 
point that variable 8 is the only one which has a significantly 
high positive correlation with any factor other than the one it 
was selected to represent (factor I). It is also of interest to know 
that this item has a positive biserial correlation with factor IV, 
while the two items representing factor IV have positive biserial 
correlations with factor I which are considerably above average. 

Thurstone’s centroid method of factor analysis was then 
used to extract two centroid factors (Table 6). The fact that 
only twenty cases were used in the calculations of the tetra- 







FACTORIAL STUDY OF BELIEFS 


4O9 

choric correlations of the matrix placed a limit upon the number 
of factors that might have been legitimately extracted without 
going below the threshold of error variance. It is of interest to 
note, however, that the pattern on the axes of the two extracted 
factors consisted of four clusters of items. With the exception of 



Fig. I. Projections of Factor Loadings upon Centroid and Rotated Axes- 


variable 8, each variable is found with the other member of the 
pair representative of each factor. Variable 8 had significant 
loadings on both factor I and factor IV. The projections on the 
two rotated axes are shown in Figure I, 

Summary .—This study was undertaken primarily as a dem¬ 
onstration of methodology, although the factors obtained have 



4IO EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


a pedagogical utility. The inverted factor technique was em¬ 
ployed so that the extracted factor loadings represented scale 
values for individuals in regard to these factors. The various 
items were then correlated with the factor loadings, and the 
factors were described by those clusters of items which had the 
highest correlations. The factors which emerged were fairly 
clear cut. Two items were then selected for each dimension 
which were highly saturated in that factor. When these eight 
items were factor analyzed, a pattern of four clusters in two 
dimensions emerged. 

It is suggested that this type of approach be used in un¬ 
structured domains in order to obtain a rough idea of the kinds 
of items which might be used for the further factorial investiga¬ 
tion of such areas. 


REFERENCES 

1. Garrett, H. E. and Fisher, T. F. "Prevalence of Certain Popular 

Misconceptions.’' Journal of Applied Psychology, X (1926), 
411-420. 

2 . Guilford, J. P. and Holley, J. W, “A Factorial Approach to the 

Analysis of Variances in Esthetic Judgments." Journal oj 
Experimental Psychology , XXXIX (1949), 208-218. 

3. Stephenson, W. "The Inverted Factor Technique." British Journal 

of Psychology, XXVI (1935-36), 344-361. 

4. Thurstone, L, L. Multiple Factor Analysis . Chicago: Univ. of Chi¬ 

cago Press, 1947. 

5. Valentine, W, L, "Common Misconceptions of College Students. 

Journal of Applied Psychology, XX (1926), 633-658. 

6. Zimmerman, W. “A Simple Method of Orthogonal Rotation ol 

Axes." Psychometrih , XVI (1946), 51-55. 



OPINION AND ACTION: A STUDY IN VALIDITY OF 
ATTITUDE MEASUREMENT 


C. ROBERT PACE 
Syracuse University 

The relationship between opinion and action is a practical 
topic which has rather basic theoretical importance as well. 

Opinion measurement has been attempted by a variety of 
scientists rather than by a concentration of talent in any single 
discipline. Thus, we find different techniques employed in public 
opinion polls, market research, studies of morale, management 
and job satisfaction, and in education. Political scientists, soci¬ 
ologists, social, clinical, personnel, and educational psycholo¬ 
gists, specialists in educational research, and specialists in 
measurement and evaluation have all made some contribution. 
While this diversity of approach may be advantageous, it is 
equally likely that some confusion and superficiality have re¬ 
sulted. Ample documentation of the latter was given in Mc- 
Nemar’s (i) critical review of attitude-opinion methodology 
three years ago. McNemar also stated that relatively few va¬ 
lidity studies had been made of attitude and opinion measuring 
instruments. 

Most definitions of attitude accept the proposition that an 
attitude is a tendency to act for or against some object or value. 
Most definitions of psychology describe psychology as the sci¬ 
ence of behavior, or concerned with the prediction and control 
of behavior. Advances in science are related to the precision of 
scientific measuring instruments; the value of a measuring in¬ 
strument is determined in large part by what you can do with 
the result obtained from it; and what you can do with the result 
depends on what relationships are known to exist between it 
and other variables. Thus, the value of an IQ resides largely in 
the fact that, having it, you can predict a person’s behavior or 
status in quite a variety of circumstances. Likewise, the value 
of an attitude measurement is largely dependent on knowing 



4x2 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


what behavior is associated with it. Opinions are the verbalized 
expression of attitudes; opinions are not action. But, certainly, 
some opinions should be correlated with action, just as some 
aspects of college achievement should be correlated with scores 
on a college aptitude test. 

An opportunity to analyze some data bearing on this problem 
of opinion validity was provided by the replies of some 2,500 
Syracuse University alumni to a sixteen-page questionnaire. 
(2) (3) This alumni follow-up study was an attempt to describe 
our educational product rather fully, examining his behavior 
with respect to some of the major objectives of general educa¬ 
tion in science, social science, and the humanities. The question¬ 
naire included seven Activity Scales of eleven items each, la¬ 
belled Politics, Civic Affairs, Religion, Art, Music, Literature, 
and Science. The subjects checked each activity they had en¬ 
gaged in during the past year. The scales have the property of 
Guttman-type scales in that participation in the more difficult 
activities tends to subsume participation in the easier and more 
common activities. The score on each scale was simply the num¬ 
ber of activities checked. Then we had nine Opinion Scales of 
six items each, labelled Politics, Civic Relations, Government, 
the World, Philosophy, Art, Music, Literature, and Science. 
The statements in the opinion scales were written to reflect 
basic concepts, insights, or appreciations which are among the 
objectives of general education. Each statement was answered 
on a five-point scale, from Strongly Agree to Strongly Disagree. 
Faculty experts in the fields sampled by the opinion scales 
tended to agree among themselves in their responses to the 
items and so it was possible to score each scale simply by count¬ 
ing the number of statements on which one’s opinion agreed with 
the opinions of the experts. With only two exceptions, for every 
statement included in the scales the degree of concensus among 
answers of the experts exceeded 2 to 1, and for 80 per cent of the 
items the ratio exceeded 4 to 1. In another section of the ques¬ 
tionnaire, we had a list of eighteen objectives of general educa¬ 
tion, which the alumni rated on a five-point scale of importance, 
from “very important” to “of no importance.” These ratings, 
of course, are also measures of opinion. 

Before reporting correlations between attitudes and activi- 



OPINION AND ACTION 


4*3 


ties, it is appropriate to note the reliability of the scales and the 
items, for this obviously affects the size of any correlation be¬ 
tween them. Six months after our sample of 2,500 had filled out 
the questionnaire (this represented a 50 per cent return from 
those who had received it) we sent a second copy of the ques¬ 
tionnaire to a small group of 120, receiving 68 in return. The 
test-retest consistency of scores over this six-months interval 
was computed, using Pearson product-moment correlations. 
For the Activity Scales, these ranged from .70 to .89 with a 
median r of .83. For the nine Opinion Scales, the median cor¬ 
relation was .65, with seven falling between .60 and .70, and 
two very low ones—.40 and .31. Then we also checked the con¬ 
sistency of responses item by item. For the Activity Scales the 

TABLE 1 

Correlations Between Scores on Activity Scales and Scores on Opinion Scales 
(A = c. 2500) 


Sculea Correlation. 


Political Activity Score or Political Opinion. Score.15 

Civic Activity Score vs Civic Opinion Score.01 

Religious Activity Score vs Philosophy Opinion Score.29 

Art Activity Score 1 is Art Opinion Score.37 

Music Activity Score vs Music Opinion Score.40 

Literature Activity Score vs Literature Opinion Score.33 

Science Activity Score us Science Opinion Score..14 


average per cent of identical responses was 85, with a range 
from 83 to 87. For the Opinion Scales the average per cent of 
identical responses was 75 with a range from 68 to 84. 

Correlations between activity and opinion scores are listed in 
Table 1. The Political Opinion Scale was designed to measure 
one’s belief in the value and importance of individual and group 
participation in a representative government, The Political 
Activity Scale is, presumably, a measure of the extent of partici¬ 
pation in various political processes, such as discussing and read¬ 
ing about political matters, voting, writing letters, signing peti¬ 
tions, giving and collecting money, etc. One might expect the 
correlation between two such scales to be considerably higher 
than .15. The Civic Opinion Scale was intended as a measure of 
tolerance and acceptance of equality of opportunity for all 
people. The Civic Activity Scale was intended as a measure of 














4I4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


community participation. The Philosophy Opinion Scale is con¬ 
cerned with acceptance of a Christian and ethical set of values. 
The Religious Activity Scale is concerned mainly with partici¬ 
pation in church-related activities. The opinion scales in Art, 
Music, and Literature were intended to measure the general 
sophistication and maturity of understainding in art, music, 
and literature. The corresponding activity scales were designed 
to reveal the frequency and depth of engagement in activities 
related to art, music, and literature. 

All these correlations are small, ranging from a low of .01 to a 
high of .40. We did not construct the scales with the sole thought 
of correlation between activities and opinions, although we cer¬ 
tainly hoped that the people whose opinions reflected the great¬ 
est insight and understanding in the various fields would tend 
also to be most active in those fields. This seems to be true to a 
limited degree in art, music, literature, and religion, but practi¬ 
cally non-existent in politics, civic affairs, and science. 

Some of the individual opinion items can appropriately be 
paired with a corresponding activity item; for other opinions 
and activities it seemed less reasonable to expect any corre¬ 
spondence. Looking through the questionnaire, I selected Tj 
opinion statements against which it seemed plausible to com¬ 
pare one or more of 39 activity items. Altogether I had 188 
pairs of activity and opinion. For a simple calculation of re¬ 
lationship, I used Thurstone’s tables for estimating tetrachoric 
correlation coefficients. Seventy correlations have been com¬ 
puted and they are the ones which seemed most likely to show 
some correspondence between opinion and action. 

A distribution of the 70 correlations shows a median value of 
.18, with a fourth at .07 or below, and another fourth at .30 and 
higher. The lowest was —.05. The highest was +.54. 

All of the correlations above .30 came from the fields of art, 
music and religion; none came from politics, civic affairs, or 
science. Literature was not included in these comparisons. 

Selected examples of these correlations are shown in Table 1. 
It is clear that participation in various church-related and other 
religious activities is definitely correlated with having a favor¬ 
able opinion toward the significance and importance of religion; 
but these activities are less clearly related to more general opin- 



OPINION AND ACTION 


415 


TABLE 2 

Correlations Between Specific Opinions and Specific Actions 


Opinions Actions Correlation 


PHILOSOPHY and RELIGION 
Disagree with the statement that: 
Religion has little to offer intel¬ 
ligent and scientific people to¬ 
day 


Rate very important as objective of 
college education: Understand¬ 
ing tne meaning and values of life 

ART and MUSIC 

Disagree with the statement that: 
Modern painting—impression¬ 
ism, expressionism, cubism, sur¬ 
realism, and the rest—is mostly 
the work of crackpots. 


Rate very important or important 
as objective of college education: 
Developing an understanding and 
enjoyment of art and music 


Disagree with the statement that: 
The tendency of some modern 
composers to use strange harmo¬ 
nies and discords makes for poor 
music 


Disagree with the statement that: 
There has been little or no out¬ 
standing music composed in the 
20th Century 


Agree with the statement that: 
Radio should give people much 
more opportunity to hear good 
serious music 


POLITICS 

Disagree with the statement that: 
Sending letters and telegrams to 
congressmen has little influence 
on legislation 


I belonged to a church 
I contributed a regular sum of 
money to a church 
I served on some volunteer 
church committee 
I prayed 

I read selections from my Bible 

I I belonged to a church 
I con tri bu ted .. . 

I prayed 
I read . . . Bible 

fl visited an art gallery or mu¬ 
seum 

I attended an exhibition of con¬ 
temporary painting 
I read one or more books about 
art, artists, or art history 

I visited an art gallery . . 

I attended an exhibition of con¬ 
temporary painting 
I read one or more books about 
art. . . 

I listened to some serious music 
by contemporary composers 
I listened to symphony programs 
on my radio at least once a 
month 

I read one or more books about 
music, musicians, or music his¬ 
tory 

I I listened to . . . serious contem¬ 
porary music .. . 

I listened to symphony programs 

II read .. .books about music ... 

I listened to . . . serious contem¬ 
porary music. . . 

■ I listened to symphony programs 

I read . . . books about music . . . 

I listened to .. . serious contem¬ 
porary music .. . 

I listened to symphony programs 

I subscribed to some orchestral or 
musical concert series 


'I wrote a letter or sent a telegram 
to a public official 


53 
43 

30 

5 4 
24 

00 
° 3 
IS 

19 

14 

27 

38 

31 
4 S 
3 » 

34 

37 

37 

3 5 

20 
10 

40 

27 

08 

3 2 
4 2 

33 

,11 




416 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 

2 — Continued 


Opinions 

Actions 

Correlation 

POLITICS ( continued) 

Agree with statement that: 

Pressure groups are useful and 
important features of democratic 
government 

I wrote a letter or sent a telegram 
to a public official 

I signed a petition for or against 
some legislation 

I contributed money to some po- 
litical cause or group 

. 10 

.06 

. IO 

Rate very important as objective 
of college education: How to par- 1 

ticipate effectively as a citizen ' 

'I voted in the last primary or lo- 
1 cal election 
[ I signed a petition . . . 

I wrote a letter or telegram . . . 

.1 contributed money . . . 

■°3 

.10 

•°5 

Rate very important as objective 
of college education: Understand¬ 
ing world issues and pressing so- < 

ci al, poii tical, and economic prob - 
lems 

I listened at least once a month 
to speeches and discussion pro¬ 
grams on the radio dealing with 
national and international prob¬ 
lems 

I read one or more books about 
politics 

.2 6 

.18 


ions about the importance of understanding the meaning and 
values in life. Opinions about art and music which reflect a 
sophisticated and mature understanding and interest tend to be 
accompanied by participation in various art and music activi¬ 
ties. In the field of politics, on the other hand, the relations be¬ 
tween opinion and action approach zero. 

An interesting phenomenon occurs in many of these com¬ 
parisons between specific opinions and specific actions. People 
who hold their opinions “strongly" tend to engage in the related 
activities whether it makes sense or not. For example, among 
those who feel strongly that sending letters and telegrams has 
some influence, 37 per cent wrote a letter or sent a telegram. 
Among those who agree that it has some influence but do not 
feel strongly about it, 23 per cent wrote a letter or sent a tele¬ 
gram. Among those who had no opinion one way or the other, 
10 per cent engaged in the activity. Then, among those who 
tended to think it had little influence, 18 per cent did it anyway; 
and among people who were convinced it had little influence, 
33 per cent engaged in the activity. Another example: among 
people who feel strongly that religion does have something to 
offer intelligent people today, 86 per cent belonged to a church 
and 76 per cent contributed a regular sum of money to a church. 




OPINION AND ACTION 


417 


Among those who agree, but not strongly, that religion is worth¬ 
while today, 7a per cent are church members and 56 per cent 
contribute money to the church. Among those who have no 
opinion one way or another, 41 per cent belong to a church and 
33 per cent contribute money. But, among those who are con¬ 
vinced that religion has little to offer, 54 per cent belong to a 
church and 47 per cent contribute money regularly to it. For 
the activity “I prayed,” the percentages drop from 86 to 34 and 
then rise to 54. 

In general, opinions regarding the importance of the various 
goals of higher education do not exhibit this U-shaped curve 
in relation to participation in the corresponding activities. For 
example, among people who rated “Understanding world issues 
and pressing social, political, and economic problems” as “very 
important,” 8a per cent listened to radio speeches and dis¬ 
cussions at least once a month. Among those who rated it as 
“important,” yi per cent listened. Among people who thought 
it was “of some importance,” 64 per cent listened, and among 
people who thought it was of little or no importance, 50 per 
cent listened. With respect to “I read one or more books about 
politics” the corresponding percentages were Tj, 15, 9, and 
zero. 

Or take an illustration from Art. Forty-seven per cent of the 
people who rated “Developing an understanding and enjoy¬ 
ment of art and music” as “very important” said they had at¬ 
tended an exhibition of contemporary painting. Only 6 per cent 
of those who considered this to be of little or no importance 
had attended such an exhibition. Also, in music activities, of 
those who rated the objective very important, 84 per cent 
listened to radio symphonies at least once a month in contrast 
to 46 per cent among those who regarded the objective as of 
little or no importance, 

What conclusions can we draw from these figures? There 
seems to be some correlation, generally in the .ao’s and .30’s, 
between belief in the importance of some field and participa¬ 
tion in activities in that field. This was true of Art and Music, 
and to a lesser extent of Religion and Politics. It is also true of 
science, although I have not reported those correlations. There 
seems to be a reasonable correlation between specific opinions 



418 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


and specific actions in Art, Music, and Religion—again gen¬ 
erally in the 30’s and 40’s. In politics, however, I found no corre¬ 
lation higher than + .15 between a specific opinion and a spe¬ 
cific action which might be expected to be associated with it. 

Many of the correlations in this study may be thought of as 
rather high. This is so if one considers the probability that 
there may be an additive or reinforcing effect among related 
opinions and the further probability that such factors as op¬ 
portunity for action, multiple actions, and variations in in¬ 
tensity of opinion all may serve to depress the size of correla¬ 
tions between single opinions and single actions. Moreover, 
tetrachoric coefficients tend to be lower than Pearson product- 
moment coefficients. The present study is primarily exploratory 
rather than analytical: it reports relationships in a wide range 
of fields based on data designed broadly to throw light 011 the 
status of the educational product rather than data specifically 
collected to analyze relationships between opinions and actions. 
Yet so limited is our knowledge of the validity of many opinion 
measurements that one of our basic needs is to collect all the 
information we can from whatever sources so that ultimately 
critical analysis and theory can be more soundly attempted. 

After the failure of the public opinion polls to predict the 1948 
Presidential election, attention was focused anew on the re¬ 
lationship between expressed opinion and behavior. The poll¬ 
sters were quick to claim that their failure in the election had 
no bearing at all on the value of their regular reports describing 
the public’s attitude on a great variety of complex issues such 
as labor relations, internationalism, European reconstruction, 
relations with Russia, etc. The fact is, however, that there is 
little or no published evidence of the relationship between such 
attitudes and behavior. Until we have more evidence of the re¬ 
lation between opinion and action, we must regard many of the 
opinion polls and attitude surveys in the same way that we re¬ 
gard most other magazine and newspaper reports—namely, as 
interesting observations to be treated with a critical open- 
mindedness. 

Advances in the science of attitude measurement will come 
in proportion to our ability to establish clear relationships be¬ 
tween opinion and action. Until we do this, our so-called meas- 



OPINION AND ACTION 


4I9 


urements will remain purely descriptive. What we must seek is 
measurement that is both descriptive and predictive of ob¬ 
servable behavior. 


REFERENCES 

1. McNemar, Quinn. “Opinion-Attitude Methodology.” Psychological 

Bulletin, XLIII (1946), 289-374. 

2. Pace, C. Robert, “Follow-Up Studies of College Graduates.” Grow¬ 

ing Points in Educational Research: 194.9 Official Report of the 
American Educational Research Association, pp. 285-290. 

3. Pace, C. Robert. “What Kind of Citizens Do College Graduates 

Become?” Journal of General Education, III (1949), 197-202. 



ESTIMATING INTELLIGENCE BY INTERVIEW 


JOSEPH V. HANNA 
New York University 

The interview had, until recently, been too long neglected 
among psychologists as yielding promising materials for re¬ 
search, This neglect, in the writer’s opinion, is due to two main 
causes. In the first place, several early and too sketchy ex¬ 
periments yielded results which tended to establish that inter¬ 
viewing techniques and methods were not sufficiently valid to 
be taken seriously (5, 8, 14). The results of these studies were 
widely quoted by influential writers, and undoubtedly had the 
effect of restraining younger clinical and applied psychologists 
from initiating research projects aiming at the appraisal of 
interviewing methods and skills. It is a strange paradox that, 
at the same time, interviewing was nevertheless accepted among 
psychologists as necessary, and many handbooks and manuals 
dealing with “ acceptable” practices in interviewing were widely 
used. 

A second major reason for the neglect of careful studies of 
interviewing stems out of the rapid development and use of 
aptitude tests. Why struggle with a large number of variables 
in intricate and baffling combination, when a single test which 
yielded a measurable correlation with a criterion, could be em¬ 
ployed? Individuals were selected for specific jobs on the basis 
of test scores. Intelligence tests were used widely in appraising 
academic capacity. Yet responsible techniques for dealing with 
the total person were too frequently absent. 

The last few years prior to World War II had witnessed the 
emergence of a keen interest in a more careful analysis and 
improvement of interview techniques and skills. Several par¬ 
tially independent efforts contributed to this revival. Greater 
care was exercised in the interviewing of applicants for em¬ 
ployment, and there was developed a more standardized frame¬ 
work for the interview (7). The use of interviewing in adver- 


420 



ESTIMATING INTELLIGENCE BY INTERVIEW 


4 « 


tising research and opinion polling invited more critical 
attention to such aspects as level of diction, form of the ques¬ 
tion, and the like. These more objective methods tended to 
inject themselves into interviewing practice in the areas of 
clinical and abnormal psychology, vocational counseling, and 
other fields. Occasional books appeared which epitomized the 
best research results and applied efforts to interviewing (2, 12). 
During World War II such instruments as biographical records, 
rating scales, and careful interview procedures made an im¬ 
pressive contribution to methods of appraising personnel (10). 
All of these efforts have grown out of the feeling that as valuable 
as testing is, it is not enough, and that such methods must be 
supplemented by techniques and procedures which deal with 
the total person. 

The study here reported has to do with the use of interview 
procedures in estimating the intelligence of clients seeking vo¬ 
cational counsel, By “intelligence” is meant that capacity which 
is measured more or less accurately by the usual test of in¬ 
telligence. While the information available to the writer had 
bearing on a rather wide range of adjustments the information 
synthesized in the process of the interview is drawn upon only 
to the extent of indicating the client's cleverness, alertness, or 
capacity usually referred to as general intelligence. 

Procedure 

Fifty-four subjects, 50 men and 4 women, were used in the 
study. They were drawn from applicants to the counseling 
service of which the writer was in charge, for assistance in 
deciding what occupation to enter, in choosing appropriate 
courses of study, and related problems 1 . The subjects were 
taken in order of application, no specifications being made as 
to age, sex, or other qualities. Care was exercised, however, to 
eliminate from the sampling all subjects who were introduced 
to the writer in such a way as to give any indication of back¬ 
ground, nature of problem, or abilities and limitations. Those 
subjects with whom the writer had contacts prior to the pre- 

1 The Personal Counseling Service, West Side Y.M.C.A., New York City. The 
study was completed shortly before the United States became involved in World 
War II. 



422 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

liminary interview were also excluded from the sampling. The 
estimates of intelligence were based solely on the information 
secured from the subject, independently of any informal or 
official reports from other sources. Within the period covered 
by the study, about 40 per cent of the applicants for the coun¬ 
seling service were eliminated from the sampling due to such 
prior information and reports, 

The subjects were remarkably heterogeneous. They ranged 
in age from 16 to 44, with a modal age of 17, an average age of 
25.9, and a median age of 24.9. Education varied from no 
formal grade completed to status in graduate and professional 
school, the average grade completed being 11.6. Intelligence, 
as measured later, varied from a percentile rank of 2 to 99 
plus. The group of fifty men and four women included several 
refugees from European countries as the result of Nazi perse¬ 
cution. 

One of the requirements of the counseling service was that 
the client fill out Aids to the Vocational Interview , an eight- 
page blank published by the Psychological Corporation. This 
blank provided space for a fairly comprehensive recording of 
the client’s family background, educational, vocational, and 
avocational interests and experiences, self-estimates of abilities 
and the like. It was usually filled out by the client following 
the preliminary interview. For the clients dealt with in the 
study, however, the blank was filled out prior to the pre¬ 
liminary interview. The interview required from 20 to 35 
minutes. The estimate of intelligence was in all instances limited 
to the impressions obtained from the subject in the process of 
the interview. The filled-in Aids was helpful, especially, in 
reducing the time which would have otherwise been required 
for each interview. Following the interview with each subject 
the estimate of intelligence was made in terms of a fancied 
percentile score such as the client would be expected to make 
on a test of intelligence suitable for entering college freshmen, 
and in competition with such a selected group. This'procedure 
was decided upon for the sake of uniformity, irrespective of 
the subject’s age or educational background. 

The estimate of intelligence was based on the principle of 
internal consistency, it being assumed that from a reasonably 



ESTIMATING INTELLIGENCE BY INTERVIEW 


4*3 


wide range of cues and impressions there would emerge a con¬ 
stellation or cluster of such items, each item of which is “valid” 
by agreement with the others. Irrelevant or misleading cues, 
not being typical of the trend, would be rejected as invalid 
(i, n). It should be held in mind that any single item of 
information offered by the subject, or any single impression of 
the counselor may or may not be a valid cue. The test of its 
validity is whether or not it fits in with other cues secured 
from a variety of sources and directions. If so, then it may be 
assumed to be valid. It is obvious, however, that the validation 
of any such cue places a burden upon the interviewer to tap a 
sufficiently wide area of the subject’s background and present 
status as to reduce to a minimum the chances of error in 
judgment. If the exploration is too limited in scope any one 
cue may be weighted unduly, leading to erroneous appraisal of 
the trait or quality being estimated. Such errors undoubtedly 
contributed to errors of estimate to be reported later in the 
present study, 

The writer will not attempt to offer a complete list of cues 
utilized in estimating intelligence. To do so would be impossible 
due to the subtlety or obscurity of certain cues and relationships 
synthesized on the basis of overall, intuitive judgment. A 
listing of the more important and obvious cues, however, may 
be helpful: (i) subject’s report of school grades earned; (2) 
subject’s reported membership in honor clubs and societies; (3) 
subject’s reported standing in school class; (4) reported dis¬ 
tinctions and achievement outside of school; (5) reported leader¬ 
ship ability; (6) certain hobbies and activities such as chess, 
bridge, athletic activities, etc.; (7) conversational ability, use 
of words, etc.; (8) extent and nature of materials read; (9) 
activities obviously of compensating nature; (10) range of ac¬ 
tivities,—varied, or limited; (11) manner and style of re¬ 
sponding to questionnaire items; (12) spelling ability; (13) age 
in relation to grade completed in school,—over-age, accelerated, 
etc. The following constellation of cues, for example, would 
point to high intelligence; membership in school honor society, 
reported high-school average of 95, discriminating use of words 
in conversation, more interested in English, mathematics, physi¬ 
cal sciences and foreign languages, than in the more general 



424 EDUCATIONAL and psychological measurement 

subjects, enjoyment of chess as a hobby, reading of sophisti¬ 
cated books and periodicals. The following constellation would 
point to limited intelligence; just average grades, "not much of 
a student,” narrow range of vocabulary and lack of discrimi¬ 
nating choice of words in conversation, habitual reading of 
tabloids and popular periodicals, more interested in general 

TABLE i 

Age, Grade Completed, Actual and Estimated Percentile Scores, and Errors of Estimation 

for Fifty-Four Subjects 


Errors of Errors of 


Age 

Grade %ile 
Comp, ACE 

%fle 

Ohio 

%ile 

Aver. 

%iic 

Esti. 

Over 

Esti. 

Under 

Esti, 

Age 

Grade 

Comp. 

%i1e 

ACE 

%ue %ile 
Ohio Aver. 

7oile ■ 
Esti. 

Over Under 
Esti. Esti. 

16 

II 

71 

8a 

76.5 

92 

15-5 


25 

u 

92 

97 

94-5 

87 

7-5 

16 

II 

7 ° 

65 

67-5 

72 

4-5 


26 

12 

50 

69 

59-5 

45 

14-5 

16 

10 

10 

5 

7-5 

45 

37-5 


27 

11 

93 

96 

94‘5 

97 

2-5 

16 

11 

83 


83* 

90 

7 


28 

12 

53 

88 

70.; 

60 

10.5 

16 

11 

74 

59 

66.; 

86 

19-5 


28 

8 


1 

1* 

50 

49 

16 

II 

61 

74 

67.; 

66 


i-S 

29 

15 

88 

78 

83 

65 

18 

17 

II 

46 

34 

40 

20 


20 

29 

9 

7 

— 

7 * 

28 

21 

17 

II 

24 

42 

33 

50 

17 


30 

5 

7 

4 

5.5 

6 

•5 

17 

II 

96 

68 

82 

81 



30 

13 

78 

91 

84.5 

94 

9-5 

17 

II 

44 

61 

52.5 

96 

43-5 


31 

12 

99 

97 

9 « 

60 

38 

17 

10 

79 

29 

54 

b 

3 ° 


31 

13 

98 

66 

82 

83 

1 

i? 

11 

57 

73 

65 

86 

21 


3 1 

8 

97 

99 

98 

92 

6 

17 

II 

8a 

54 

68 

78 

10 


3 i 

16 

— 

90 

90* 

96 

6 

17 

II 

18 

u 

J 5 

62 

47 


3 1 

10 

— 

32 

32* 

40 

8 

18 

II 

58 

63 

60.; 

92 

3 M 


33 

8 

11 

6 

8-5 

24 

1 5-5 

18 

11 

55 

— 

55 * 

40 


15 

33 

10 

54 

74 

64 

45 

19 

18 

13 

99 

100 

99-5 

9; 


4-5 

33 

18 

95 

97 

96 

95 

1 

18 

II 

94 

89 

9 i. 5 

47 


44.5 

35 

12 

9 ° 

— 

90* 

90 


18 

II 

52 

52 

52 

55 

3 


35 

12 

99 

93 

96 

78 

18 

19 

13 

1 

33 

17-5 

45 

a?.; 


35 

H 

94 

—■ 

94 * 

87 

7 

10 

8 

3 ° 

62 

46 

45 


1 

3 6 

13 

7 6 

67 

7 i -5 

75 

3-5 

20 

H 

98 

SB 

93 

80 


13 

40 

14 

9 i 

62 

76.; 

90 

13 T "H 

20 

l 3 

93 

84 

88,; 

83 


5.5 

40 

13 

92 

9 i 

91 5 

72 

, I 9-5 

11 

12. 

96 

95 

95-5 

65 


30-5 

41 

0 

— 

54 

54 * 

60 


23 

*5 

97 

89 

93 

72 


21 

43 

8 

75 

95 

85 

80 

:.V 5 

24 

16 

8+ 

8a 

83 

87 

4 


44 

16 

9 i 

99 

95 

98 

3 

25 

16 

88 

80 

84 

72 


12 


14 

68 


68* 

70 

3 


* Where subject took only one test the single score is considered ‘'average.” 


subjects such as history than in the more exacting subjects, 
unusual emphasis of physical activities, over-identification with 
limited hobby. A highly intelligent individual may prefer to 
read tabloids to other newspapers. The individual wi th mediocre 
or low intelligence may unconsciously or otherwise exaggerate 
his school standing even to the point of indicating honor society 
membership. Such erroneous cues generally do not fit into the 







ESTIMATING INTELLIGENCE BY INTERVIEW 4^5 


constellation which seems generally typical of the individual, 
and can be discarded as invalid. 

One further explanation should be made here. The writer 
made no attempt to weigh or evaluate each cue separately as 
has been done occasionally in the scoring of interview forms, 
and application blanks (3, 6, 13, 15). He rather trusted to his 
judgment to sense the less tangible relationships along with the 
more obvious cues in arriving at his final estimate. 

Following the interview and estimate, all subjects were given 
a battery of tests including two tests of intelligence,— the 
American Psychological Examination for College Freshmen, and 
the Ohio State University Psychological Test. The first is a time 
limit and the second a work limit, or power test. Estimated 
and actual percentile scores and errors of estimate for the 54 
subjects are given in Table 1. Distributions of actual percentile 
scores on the A. C. E. and Ohio, and of estimated percentiles, 
show the population to be of considerably above-average intel¬ 
ligence. However, the subjects used in the present study are 
rather typical of clients in general who, throughout the years, 
applied for counseling to the Personal Counseling Service. All 
previous studies made of the counselee clientele show above 
average distributions of intelligence (4). 

Results 

The actual percentile scores on each test were correlated with 
the estimated percentiles of intelligence and the two tests were 
correlated with each other, by the Pearsonian product-moment 
formula. The following correlations were obtained: A. C. E. 
with estimates, r = .71; Ohio with estimates, r = .66; A. C. E. 
with Ohio, r = .77. It will be observed that agreement between 
estimated percentile scores and scores on each of the two tests 
of intelligence is just slightly lower than the correlation be¬ 
tween the tests. This poses an interesting question as to which 
of the two instruments or techniques would be the more valid 
in predicting educational or other achievement. 

The results will be examined briefly for the purpose of identi¬ 
fying, if possible, any errors which may account for the de¬ 
viation of estimates from actual scores. Both tests of intelligence 



426 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

were taken by 45 of the 54 subjects. For these the average of 
the two test scores was taken as a basis for comparison with 
estimated scores. The difference between the estimated score 
and the average of the test scores, is designated “overesti¬ 
mation” or “underestimation.” Examination of Table 1 will 
show that 15 subjects were overestimated, and 14 were under¬ 
estimated by a margin of 10 or more percentile points. The 
average error of overestimation was 15.8 and of underesti¬ 
mation, 14.4 percentile points. The highest error of estimation 
was 49 percentile points, one subject being overestimated by 
this margin. One subject was underestimated by a margin of 
44 percentile points. For 25, almost half the subjects, however, 
the error of estimation was 10 or less percentile points. 

For those subjects for whom the error of estimate was ten or 
more percentile points, a study of the records and such notes 
as had been made following the interview was made with the 
hope of identifying the factors responsible for the deviation. It 
is obvious that such an examination cannot be wholly objective. 
A preliminary inspection, however, had indicated unmistakably 
the presence of at least one such factor. Examination of data 
in columns 7 and 8, Table 1, shows clearly the tendency to 
overtimate the intelligence of younger subjects. Of the 19 sub¬ 
jects eighteen or below, 10 were overestimated by ten or more 
points, whereas only 3 were underestimated by this margin. 
Of the 35 who were nineteen or above, only 5 were overesti¬ 
mated by ten or more points, whereas n were underestimated 
by this margin. The tendency to underestimate the intelligence 
of older subjects, however, is not as clear as the tendency to 
overestimate the intelligence of younger clients. 

The further examination of the filled-in Aids , in addition to 
casting light on the importance of the age factor also indicated 
roughly several additional factors which seem to have bearing 
on errors of estimation. These items, impressions, etc., were 
summarized and appear in Tables 1 and 3. Several items in 
Table 1 show higher frequency among those overestimated, and 
in Table 3 for those underestimated. 

Reports by subjects of scholarship standing as indicated by 
grades, position in class, and the like, is apparently the most 
important single source of errors of estimation, there being a 



ESTIMATING INTELLIGENCE BY INTERVIEW 417 

tendency to rate individuals higher who reported scholarship 
standing in or near the upper quarter of their class, and a 
corresponding tendency to rate those lower who reported 
scholarship below average. Other characteristics of those under¬ 
estimated were taciturnity, evidence of mediocre reading habits, 
the early selection of specializing courses such as shop work, 

TABLE 1 

Characteristics of Subjects Overestimated by Ten or More Percentile Scores 

No. 


Reported outstanding specific aptitudes. 

(mathematics, technical, music, art, etc.) 

Reported high scholarship, regents grades, honors, etc. 

Conversational ability, easy flow of words. 

Good habits 0/ application. 

Good looks... 

Miscellaneous.. 

(One frequency each for the following characteristics: Well dressed, 
Practical judgment, Good vocational adjustment, Self-assurance, 
Foreigner,*—language difficulty, Physical handicap due to birth 
injury*) 


9 

7 

3 

2 

2 

6 


♦In instances such as this it is doubtful if test scores indicate actual level of in¬ 
telligence. 


TABLE 3 

Characteristics of Subjects Underestimated by Ten or More Percentile Scores 

No. 


Mediocre or low scholarship. 13 

Taci turn, uncommunicative. 7 

Mediocre reading habits. j 

Early specializing courses. $ 

Emotionally maladjusted... 4 

Overidentiftcatlon with narrow interests. 3 

Failure to finish courses. a 

Miscellaneous. 6 


(One frequency each for the following characteristics: Poor speller, 
frequent school absences, poor study habits, marked facial asym¬ 
metry, dull appearance, extreme dependence on others) 


typing and the like, and emotional maladjustment; and of those 
overestimated, good conversational ability, appearance, and 
positive traits of personality. In a good many cases inflated, 
sketchy or too modest reports were corrected on the basis of 
additional items of contra-information. It can readily be seen, 
however, that a paucity of such “rounding out” information 
might lead to the acceptance at face value, of questionable 




















428 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


information as fact, resulting in mistakes in estimates. Had the 
writer been more thorough and searching in his interviewing 
some of the errors could probably have been avoided or reduced. 

Summary of Results 

1. It is possible to estimate intelligence test scores with con¬ 
siderable validity on the contents of the interview. Correlations 
between estimates and test scores are .71 and ,66, just slightly 
lower than the correlation between the two tests, .77. 

2. There was a tendency to overestimate the intelligence of 
younger subjects, and to a lesser extent to underestimate the 
intelligence of older clients. 

3. Underestimation and overestimation of intelligence seem 
to be related also to reported achievement, reported specific 
aptitude, negative and positive personality qualities, habits of 
application, and the like. 

Discussion of Results 

While it seems clearly possible to estimate intelligence, ability 
to learn, etc., by interview, it also seems unmistakably clear 
that the validity of the estimates will depend on two general 
factors or conditions. First, there must be available a sufficient 
range of reported information, together with reasonably ade¬ 
quate facilities for interviewing. Second, the experience, com¬ 
petency and skill of the interviewer would seem to be a primary 
requisite for the validity of estimates. 

The relative values of estimates and actual test scores require 
discussion of a further possibility. Heretofore in the present 
discussion the differences between actual and estimated scores 
have been referred to as “errors of estimation’’ on the tradi¬ 
tional assumption that actual test scores should be the more 
valid in predicting scholastic and related types of achievement. 
It is obvious, however, that in the absence of objective vali¬ 
dation of either estimates or tests for the group of subjects here 
studied, the relative validities of estimates and tests can only 
be a matter of conjecture. It seems appropriate to postulate 
that in dealing with groups such as here reported, careful inter¬ 
viewing based on materials supplied by the individual himself 
and impressions gained from such interviewing, independently 



estimating intelligence by interview 429 

of official evidence of past performance, grade transcripts, and 
so on, can be at least as valid in predicting further educational 
and related achievement as a good test of intelligence. The 
interview, operating at its highest level, however, is not offered 
as a substitute for tests of intelligence. The conclusion offered 
is a reminder to counselors that the storehouse of information 
available through systematic interviewing, a source too little 
utilized by many counselors, should not be neglected; and that 
in the appraisal of the capacities and interests of the client the 
interview based upon such experience must be regarded as an 
essential supplement to the more objective measures. In closing 
it seems appropriate to suggest that counselors in training would 
find it good practice to utilize interviewing procedures in esti¬ 
mating the intelligence of clients in advance of testing. 

REFERENCES 

I. Allport, Gordon W. Personality: A Psychological Interpretation J 

New York: Henry Holt and Co., 1937. 

а. Bingham, W. V. and Moore, B. V. How to Interview. New York: 

Harper and Borthers, 1931. 

3. Goldsmith, Dorothy. “The Use of the Personal History Blank 

as a Salesmanship Test.” 'Journal of Applied Psychology, 
VI (19W), 149-155. 

4. Hanna, Joseph V. “Job Stability and Earning Power of Emo¬ 

tionally Maladjusted as Compared with Emotionally Ad¬ 
justed Workers.” Journal of Abnormal and Social Psychology, 

xxx (1935), 155-163. 

5. Hollingworth, H. L. Voca ional Psychology and Character Analy¬ 

sis. New York: D, Appleton Co., 1929. 

б. Hovland, Carl I. and Womderlic, E. F. “Prediction of Industrial 

Success from a Standardized Interview.” Journal of Ap¬ 
plied Psychology , XXIII (1939), 537 “ 546 . 

7. Jenkins, John G. “Characteristics of the Question as Determi¬ 

nants of Dependability.” Journal of Consulting Psychology, 
V (1941). 164-169. 

8. Magsen, E. H. “How Do We Judge Intelligence?” British Journal 

of Psychology, Supplement No. 9, 1926, 1-108, 

9. McMurray, R, N. “Validating the Patterned Interview." Per¬ 

sonnel, XXIII (1947), 263-272. 

10. Newman, Bobbitt and Cameron. “The Reliability of the Inter¬ 
view Method in an Officers Candidate Program.” American 
Psychologist, 1(1946), 103-109. 

II. Primoff, Ernest S. “Correlations and Factor Analysis of the 

Abilities of the Single Individual.” Journal of General Psy¬ 
chology, XXVIII (1943), 121-132. 



430 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

12. Roethlisberger, F. J. and Dickson, W. J. Management and the 

Worker. Cambridge, Mass.; Harvard Univ. Press, 1940. 

13. Russell, W. and Cope, G. V. “A Method of Rating the History 

and Achievement of Applicants for Positions.” Public Per¬ 
sonnel Studies , III (1925), 202-209, 

14. Scott, W. D. '‘Selection of Employees by Means of Quantitative 

Examinations.” Annals of the American Academy of Political 
and Social Science , LXV (1916), 182-193. 

15. Snedden, Donald. “Measuring General Intelligence by Inter¬ 

view.” Psychological Clinic , XVIV (1930), 131-134. 



INCLUSION OF "NONE OF THESE” MAKES 
SPELLING ITEMS MORE DIFFICULT 


MARCIA BOYNTON 
U. S. Civil Service Commission 

A special study of the spelling items in its Clerk and Steno¬ 
grapher-Typist Examinations has been undertaken by the U. S. 
Civil Service Commission. All general-test items of these ex¬ 
aminations ate subjected to systematic statistical evaluation, 
but further analysis is being made of this one type. The purpose 
of the study is to determine what elements make for item diffi¬ 
culty, in order to establish guides for improving the control of 
difficulty in the many alternate forms of examinations required. 
The amount of information is insufficient as yet to warrant any 
conclusions. However, a few findings are emerging. 

An indication of the value of the alternative “none of these” 
is one of the preliminary findings. Each of the spelling items has 
three alternative spellings of a single word, with "none of these” 
as a fourth alternative. The competitor is instructed to select 
the correct spelling, if any, or to select the fourth alternative. 

Although an item type with only four alternatives is not so 
desirable as one with five, so few words lend themselves to a 
sufficient variety of plausible misspellings that the use of five 
choices was not undertaken. It is recognized that the use of 
various misspellings is undesirable for the further reason that it 
emphasizes wrong instead of correct spellings. To avoid both of 
these objectionable features, each item could include four or 
five different words. The use of different words in this way, how¬ 
ever, presents too great a problem to test constructors in two 
respects. First, it exhausts the supply of suitable words too 
quickly in view of the constant need for new sets of examination 
papers. Second, it increases too greatly the number of words 
which must not appear in any of the instructions, the vocabu¬ 
lary items, the reading items, or the grammar items of the same 
test booklet. 



431 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The purpose of including "none of these" as an alternative 
was to increase the number of possible alternatives, thereby 
reducing the chance that competitors’ guesses will be correct. 

Analysis of competitors’ answers shows that an item that does 
not include the correct spelling is much more likely to prove 
difficult than an item in which the correct spelling appears. As 
is to be expected, an item in which there are two or more points 
of difficulty is more likely to prove difficult than an item in 
which there is only one such point. For example, in a sample 
item, “occasion,” a poor speller might wonder whether to use a 
single consonant, or whether to double both the "c” and the 
"s”, The two findings are consistent, since a constructor would 
not be able to devise three attractive misspellings of a word un¬ 
less it contained more than one point of plausible misspelling. 



A TABLE AND AN ABAC FOR TESTING THE 
SIGNIFICANCE OF RHO 


FRANK M. DU MAS 
University of Texas 

I. Introduction 

Statisticians have developed several indices of relationship 
based on ranks. It seems necessary, therefore, to explicitly de¬ 
fine the statistical quantity with which this paper is concerned. 
This statistical quantity derives from Spearman, it is usually 
called the coefficient of rank difference correlation, and will be 
referred to in this paper as rho or p, Rho is defined as 

6 2 d 2 

p ~ 1 N(N 2 - i)» W 

where, 2d ? is the sum of the squared differences between paired 
ranks; N is the number of pairs of ranks. 


II. Older Method of testing the Significance of Rho 

The older method of testing the significance of rho is to 
compute the standard error of rho, divide rho by its standard 
error, enter the normal probability table with the quotient, 
and then make a statement concerning the probability of ob¬ 
taining at least rho = o in future samples of the same size 
taken from the same population. The standard error of rho, 
<rp, is usually computed from formula (2) as follows: 


1.04 (i - p 2 ) 

ffp ~ VN^T 


00 


There are at least three criticisms of this method of testing 
the significance of rho. First, formula (2) is only a rough ap¬ 
proximation of the standard error of rho. Second, the distribu¬ 
tion of rho is markedly skewed when rho is moderate or large 
and, therefore, the normal probability table should not be used. 
Third, in those instances where rho is most frequently applied 


433 



434 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(say, when N < 9), the sampling distributions 0/ rho are most 
peculiar. When N = 3 or 4, they are bimodal; when N = c 
6, 7 or 8, they have a serrated profile. When N > 9, the dis¬ 
tribution may be said to be ummodal and as N —> co the 
sampling distributions approach the normal distribution as the 
limit. However, in every case the sampling distributions, for 



Fig. 1. Abac for Testing the Significance of Rho When N > 9 


/? = o, are symmetrical. The methods that follow obviate these 
criticisms to a considerable degree. 

III. Newer Method 0/ Testing the Significance of Rho 

The sampling distributions of rho when N > 9 1 may be 
said to be unimodal. Actually, these distributions have a saw¬ 
tooth profile which tends to smooth out as N increases and 
approach the normal distribution as the limit. We shall assume 
the population of rho for samples of N > 9 to be normally 

'This value is chosen arbitrarily; we could have chosen 8, io, n, 12, etc. 


TESTING SIGNIFICANCE OF RHO 


435 


distributed, Under this assumption, we may then enter stu¬ 
dent’s distribution and test the significance of rho. Kendall 
(i, p. 401) suggests these assumptions and procedures by an 

TABLE 1 


Table far Testing the Significance of rho when N < 9 . Values with an Asterisk are 
Probabilities Rather than Levels of Confidence 


N 

p 

- 4 
%L,C. 

N 

p 

=> 5 

%L.C. 

N 

p 

= 6 
% L.C. 

N 

P 

=■ 7 
% L.C. 

N 

P 

= 8 
% L.C. 

1.00 

8 

1.00 

2 

I ,00 

.0028* 

1.00 

.0004* 

I .OO 

.000050* 

,80 

33 

.90 

8 

■94 

2 

.96 

. 0028 '* 

.98 

.00040* 

.60 

41 

.80 

r 3 

.89 

3 

•93 

.0068* 

•95 

.00114* 

.40 

71 

.70 

23 

■83 

6 

.89 

I 

•93 

.0022* 

,2o 

92 

.60 

33 

■77 

10 

,86 

a 

.90 

.0046* 

.00 

100 

.50 

45 

■71 

14 

.82 

3 

.88 

.0072* 



.40 

52- 

.66 

18 

•79 

5 

.86 

I 



■3° 

68 

.60 

24 

•75 

7 

•83 

2 



.10 

78 

•54 

3° 

-7i 

9 

.81 

a 



,10 

95 

■49 

36 

.68 

11 

>79 

3 



.00 

IOO 

•43 

42 

.64 

14 

.76 

4 





•37 

5° 

.6l 

J 7 

■74 

5 





-31 

56 

•57 

20 

■71 

6 





.26 

66 

■54 

24 

.69 

7 





.20 

71 

<50 

27 

.67 

8 





.14 

80 

.46 

3° 

■64 

IO 





.09 

92 

•43 

35 

.62 

II 





.OO 

IOO 

■39 

40 

,60 

13 







■36 

44 

•57 

15 







•32 

s ? 

• 55 

17 







.29 

56 

• 52 

20 







.25 


.50 

22 







.21 

66 

.48 

24 







,18 

71 

•45 

27 







.14 

78 

•43 

30 







.11 

84 

.40 

33 







,07 

91 

■ 38 

36 







.04 

96 

.36 

39 







.00 

IOO 

•33 

43 


■31 46 

.29 50 

.26 54 

.24 58 

.21 62 

.19 66 

.17 70 

•14 7 S 

.12 79 

.10 84 

.<57 88 

■05 93 

.02 98 

.00 100 


example in which N equaled 10. We shall use this procedure 
when N > 9. 

Figure I is an abac to be used in testing the significance of 







436 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

rho when N > 9. Formula (3) has been suggested by Kendall 
(1, p. 401) as appropriate for testing the significance of rho. 



Since N — 2 are the degrees of freedom, df, we may substitute 
and solve for p. When this is done we have 

P = y t* + df ' ^ 

It was then an easy matter to enter formula (4) with t and 
df for the various levels of confidence shown in Figure I, The 
contours in Figure I indicate changes in rho as a function of 
the sample size with the level of confidence for rejecting the 
null hypothesis as the parameter. 

Figure I may be used in the following manner. Assume a 
sample of N = 27, and p = .38. Entering Figure I with these 
values we find that we may reject the null hypothesis at the 
5 per cent level of confidence. 

Because of the unusual characteristics of the sampling dis¬ 
tributions of rho when N < 9, the t test of significance would 
be inappropriate. But it is precisely for the small values of N 
that a significance test is needed so badly. For example, clinical 
research is often an intensive study of a few individuals and 
rho is often used in such situations. Table 1 was constructed 
with these considerations in mind. Kendall (1, Table 16.2) has 
tabled the probability of obtaining the various values of 2d 2 
for several different values of N. The transformation of Zd 2 
and probabilities into rho and levels of confidence is obvious. 
Table 1 allows us to make (within rounding errors) an ‘exact’ 
test of the null hypothesis for samples of 4 to 8 cases. 

Table 1 may be used in the following manner. Assume a 
sample of N = 7 and p = .57. We may then reject the null 
hypothesis at the 20 per cent level of confidence. 

In both Figure I and Table 1 we are testing hypotheses 
concerning the absolute value of rho. 

REFERENCES 

1. Kendall, M. G. The Advanced Theory of Statistics. Vol. I. London: 

Charles Griffin and Co., 1945. 



RECENT PUBLICATIONS RECEIVED 

Barahal, George D. Converting a Veterans Guidance Center. Stan¬ 
ford: Stanford University Press, 1950. 100 pp. Si.50. 

Hamilton, Kenneth W. Counseling the Handicapped in the Re¬ 
habilitation Process. New York: The Ronald Press Co., 
1950, 196 pp. $3.50. 

Porter, Jr., E, H. An Introduction to Therapeutic Counseling. Boston: 
Houghton Mifflin Co., 1950. 223 pp. $2.75. 

Spearman, C. and Jones, LL, Wynn. Human Ability. London: 
Macmillan & Co., Ltd., 1950 198 pp. $2.50. 

Ulett, George. Rorschach Introductory Manual, St, Louis: Educa¬ 
tional Publishers, Inc., 1950. 48 pp. $3.00. 

Proceedings of the 14th Annual Guidance Conference held at Purdue 
University , April 4 and 5, 1949. Studies in Higher Educa¬ 
tion, LXIX. Lafayette: Division of Educational Reference, 
Purdue University. 80 pp. $1.50. 


4J7 



THE CONTRIBUTORS 


Marcia Boynton—M.A., George Washington University. Princi¬ 
pal Assistant, Head Assistant, Personnel Research, Research Division; 
Program Reviewer, Personnel Utilization; Examiner, Test Develop¬ 
ment Unit, U.S. Civil Service Commission. Associate Member, Ameri¬ 
can Psychological Association. Member, D.C. Psychological Associa¬ 
tion, Society for Personnel Administration. 

William R. Birge—B.A., Princeton University, 1941. With the 
U.S. Navy, 1942-1946. Graduate student, Duke University, 1946- 
1950. Instructor, Rensselaer Polytechnic Institute, 1950—. 

Claude E. Buxton—Ph.D., University of Iowa, 1937. Instructor, 
University of Iowa, 1937-1938. Research Associate, Swarthmore Col¬ 
lege, 1938-1939. Instructor, Northwestern University, 1939-1942. 
Assistant Professor, University of Iowa, 1942-1946. Associate Pro¬ 
fessor, Northwestern University, 1946-1949. Professor, Yale Univer¬ 
sity, 1949-. Author of articles on human and animal learning, meth¬ 
odology, and on the teaching of psychology. Fellow, American 
Psychological Association. Member, Society of Experimental Psy¬ 
chologists, Sigma Xi, American Association for the Advancement of 
Science, Midwestern Psychological Association, Eastern Psychological 
Association. 

N. M. Downie—Ph.D., University of Syracuse, 1948. Instructor 
in Biology, Robert College, Istanbul, Turkey, 1936-1939. Instructor 
in Education and Graduate Assistant, Evaluation Service Center, 
Syracuse University, 1946-1948. Assistant Professor of Education, 
State College of Washington, 1948-. 

Frank M. du Mas—M.A., University of Virginia, 1941. Graduate 
Student, University of Virginia, 1941-1942. War work and military 
service, 1942-1945. Instructor in Psychology, University of Denver, 
r 945 _I 947 - Research Assistant, University of Iowa, 1947-1948. Asso¬ 
ciate Professor of Psychology, Florida State University, 1948-, On 
contract, Office of Naval Research, under the guidance of the Ameri¬ 
can Council on Education. 

Joseph V. Hanna—Ph.D., New York University, 1928. Instructor, 
Assistant Professor, Associate Professor, New York University, 1926- 
1949. Director, Veterans Advisement, Vocational Service Center, 
Y.M.C.A., New York City, 1944-1947. Consultant to Vocational 
Service Center, at present. Co-author of The Dissatisfied Worker 
and author of articles in professional journals. Fellow, American 
Psychological Association. Member, N. Y. State Psychological As¬ 
sociation, Metropolitan N. Y. Association for Applied Psychology, 
National Vocational Guidance Association. Diplomate, American 
Board of Examiners in Professional Psychology. 

Joseph. C. Heston—Ph.D., Ohio State University, 1941. Science 

438 



THE CONTRIBUTORS 


439 


Instructor and Director of Remedial Work, West Jefferson, Ohio, 
High School, 1932-1939. Assistant in Psychology, Ohio State Uni¬ 
versity, 1939-1941. Instructor in Psychology, 1941-1942; Assistant 
Professor, 1942-1946; Director of Bureau of Testing and Research, 
1944-; Associate Professor, 1946-1950; Professor, 1950- De Pauw 
University. Research Consultant, Farm Security Administration, 
1940—1942. Author of the Heston Personal Adjustment Inventory 
(World Book Co.) and of articles in professional journals. Member, 
American Psychological Association, American College Personnel As¬ 
sociation, American Association of University Professors, Sigma Xi. 

J. W. Holley—M.A., University of Southern California, 1947. 
Counselor, University of Chicago, 1947. Instructor in Psychology, 
University of Illinois, Chicago Undergraduate Division, 1947-1948. 
In charge of Admissions Testing, Northwestern University, 1948- 
1949. Member, Psi Chi, Sigma Xi. 

L. J. Lins—Ph.D., University of Wisconsin, 1946, Teacher, rural 
school, Highland, Wisconsin, 1939-1940, Teacher and Principal, City 
Elementary, Mineral Point, Wisconsin, 1940-1942. Teacher and 
Director, Visual Education, Township High School, Amboy, Illinois, 
1942-1943. Teacher, Central High School, Madison, Wisconsin, and 
Research Assistant, Teacher Personnel Research Bureau, University 
of Wisconsin, 1943-1944. All-University Fellow, School of Education, 
University of Wisconsin, 1944-1945. Instructor, 1945-1946; Assistant 
Professor, 1946-1947, Dept, of Education, University of Detroit. 
Assistant to Executive Director, Bureau of Guidance and Records 
(Asst. Prof.), 1947-1948; Director, Office of Statistics and Research 
(Asst. Prof), 1948-, University of Wisconsin. Author of articles on 
audio-visual education, teacher education, personnel, and measure¬ 
ment. Member, Phi Delta Kappa, American Statistical Association, 
Society for the Advancement of Education, American College Person¬ 
nel Association, Wisconsin Education Association, 

M lton M. Mandell—B.A., New York University, 1933. As¬ 
sistant Director of Examinations, Los Angeles City Civil Service 
Commission, 1939-1940. Classification Consultant, State of Connecti¬ 
cut, 1940-1941. Regional Personnel Officer, OEM, 1941-1942. Person¬ 
nel Officer, Office of Program Vice-Chairman, War Production Board, 
1942-1943. Chief Analyst, Committee For Congested Areas, 1943- 
1944. Chief, Administration and Management Testing, U. S. Civil 
Service Commission, 1944-. Member, American Society of Public 
Administration, Civil Service Assemble. 

C. Robert Pace—Ph.D., University of Minnesota, 1937. Instruc¬ 
tor and Research Associate, General College, University of Minnesota, 
1937-1940. Research Associate, Commission on Teacher Education, 
American Council on Education, 1940-1943. Head, Research Unit 
and Field Research Section, Bureau of Naval Personnel, 1943-1947. 
Associate Director, Director, Evaluation Service Center, Syracuse 
University, 1947-. Author of They Went to College (Univ. of Min¬ 
nesota Press) and co-author of Evaluation in Teacher Education 
(American Council on Education); author of articles on attitude 
measurement, evaluation, and higher education. Fellow, American 



440 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Psychological Association. Member, American Educational Research 
Association, National Society for the Study of Education, American 
Association for Public Opinion Research. 

Robert G-. Smith, Jr—M.A., University of Florida, 1947. Teaching 
Assistant and graduate student, University of Illinois, 1947-. Author 
of work on color discrimination. Member, Phi Kappa Phi, Sigma Xi. 

Robert M. W. Travers—Ph.D., Columbia University, 1941, R e . 
search Associate, Teachers College, Columbia University, 1938—1941. 
Instructor in Psychology, Ohio State University, 1941-1943, Person¬ 
nel Technician, Adjutant General’s Office, 1943-1945. Assistant Di¬ 
rector, Graduate Record Examination, 1945-1946. Associate Professor 
of Psychology and of Education, and Chief, Evaluation and Ex¬ 
aminations Division of the Bureau of Psychological Services, 1947- 
Author of articles on problems of evaluation, statistical methods 
related to the construction and use of tests, and of a book entitled 
Teacher-Made Objective Tests of Achievement. Associate Member, 
Americal Psychological Association. Member, American Educational 
Research Association, 

Maurice E. Troyer—Ph.D., Ohio State University, 1935. Super¬ 
intendent, Bureau of Township Schools, Princeton, Illinois, 1925- 
1929, Assistant Professor of Psychology, Bluffton College, 1930-1932. 
Instructor in charge of Remedial Program, Ohio State University, 
1933-1936. Assistant Professor of Education, Syracuse University, 
1936-1939. Associate Professor, 1939. Associate in Evaluation, Com¬ 
mission on Teacher Education, American Council on Education, 
1940-1943. Director, Bureau of School Services, Professor of Educa¬ 
tion, Syracuse University, 1943. Director, Evaluation Service Center, 
Syracuse University, 1945. Member, American Psychological Associa¬ 
tion, American Association of Applied Psychology, American Edu¬ 
cational Research Association, American Association for the Ad¬ 
vancement of Science. 

Wimbum L. Wallace—Ph.D,, University of Michigan, 1949. 
Assistant, Department of Psychology, 1939-1941, 1949; Assistant 
Clinician, Psychological Clinic, 1940-1941, University of Michigan. 
Senior Instructor, Curtiss-Wright Technical Institute, Glendale, Cali¬ 
fornia, 1941-1944. Personnel Officer, U. S. Navy, 1944-1946. Chief, 
V. A. Guidance Center, 1946—3948; Research Associate, Evaluation 
and Examinations Division of the Bureau of Psychological Services, 
1948-1949, University of Michigan, Director of Guidance, University 
of Massachusetts, 1949-. Member, American Psychological Asso¬ 
ciation, American College Personnel Association, Sigma Xi, Phi 
Sigma, Phi Delta Kappa. 




THE 1950 CONVENTION PROGRAM 

SUNDAY, MARCH 26 
EXECUTIVE COUNCIL MEETING, ACPA 
MONDAY, MARCH 27 

GENERAL SESSION 

Presiding... Hilda Threlkeld 

Dean of Women, University of Louisville 
Symposium: “Counseling Problems and Techniques: Develop¬ 
ments for the Future in the Light of an Evaluation of the 
Present.” 

“Developments in Counseling by Faculty Advisers” 

Carroll Miller, Assistant Dean of College of Liberal 
Arts, Howard University 
“Developments in Residence Hall Counseling" 

Merle M. Ohlsen, Associate Professor of Education, 
Washington State College 
“Developments in Counseling Bureaus and Clinics" 

Roval B. Embree, Assistant Director, Counseling Bureau, 
University of Texas (Read by Gordon Anderson, Director 
of Counseling Bureau, University of Texas) 

LUNCHEON 

Presiding. Mitchell Dreese 

Dean of the Summer Sessions and Professor of Educational 
Psychology, George Washington University 

"No Vain Imaginings". Thelma Mills 

Director Student Affairs for Women, University of Mis¬ 
souri, and President ACPA 

FIRST BUSINESS MEETING 

Presiding. Thelma Mills 

Director Student Affairs for Women, University of Mis¬ 
souri, and President ACPA 
Reports: 

Kate Mueller, Chairman Committee on Research 
Clifford Houston, Chairman Committee on Standards 
George A. Pierson, Chairman Committee on Nominations 
Lyle W. Croft, Chairman Committee on Membership 

SECTIONAL MEETING 

Presiding. Jacob H. Cunningham 

Dean of Students, Lynchburg College 
“The Role of the Church Related College in Higher Education.” 
Raymond F. McLain, President, Transylvania College, Lex¬ 
ington, Kentucky 


443 








444 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TUESDAY, MARCH a8 
“Council Day” 

WEDNESDAY, MARCH 29 
SECTIONAL MEETINGS: 

"Major Problems of Personnel Administration of Concern 
to All College Personnel Workers” 

1 . Presiding. Dugald Arbuckle 

Director Student Personnel, School 0/ Education, Boston 
University 

Panel Discusion (for those from large universities and colleges) 
Panel Members: 

Martin Snore, Assistant to the Dean of Students, Uni¬ 
versity of Minnesota 

John L. Bergstresser, Assistant Dean of Students, Uni¬ 
versity of Chicago 

Daniel D. Feder, Dean of Students, University of Denver 

1. Presiding. Everett B. Sackett 

Dean of Student Administration, University of New Hamp¬ 
shire 

Panel Discussion (for those from middle-sized colleges and 
universities) 

Panel Members: 

Robert ICamm, Dean of Students, Drake University 
Nathan Kohn, Registrar, Washington University 
William C. Craig, Acting Dean of Students, Washington 
State College 

3. Presiding.L. R. Palmerton 

Director Student Personnel, South Dakota School of Mines 
and Technology 

Panel Discussion (for those from small liberal arts colleges, 
church-related colleges, and teachers colleges) 

Panel Members: 

Lawrence Riggs, Dean of Students, DePauw University ■ 
Helen M. Voorhees, Director, Appointment Bureau, 
Mount Holyoke College 
Louise T. Paine, Dean, Elmira College 

SECOND BUSINESS MEETING 

Presiding. Thelma Mills 

Director Student Affairs for Women, University of Mis¬ 
souri, and President ACPA 
Reports: 

Paul McMinn, Chairman Committee on Publications 
Ralph Carli, Chairman Committee on International Rela¬ 
tions 

C. H. Reudisili, Chairman Committee on Proceedings 
GENERAL SESSION 

Presiding. Eugene L. Shepard 

Dean of Student Personnel, Stephens College 








ig5 0 CONVENTION PROGRAM 


445 


Main Speech: "Evaluation and Research in Group Dynamics” 
Kenneth F. Herrold, Assistant Professor of Education, 
Teachers College, Columbia University 
Two Illustrative Studies, reported by: 

Ira J. Gordon, Kansas State College 
David S. Brody, Montana State College 

GENERAL SESSION 

Presiding. Paul C. Polmantier 

Director University Testing and Counseling Services, Uni- 
sity of Missouri 

Symposium: "Problems of Evaluation in Student Personnel 
Work" 

“How to Go About The Process of Evaluating Student 
Personnel Work” 

William M. Gilbert, Acting Director Student Counseling 
Bureau, University of Illinois 
"Major Limitations in Current Evaluation Studies” 

Ruth Strang, Professor of Education, Teachers College, 
Columbia University 
Two Illustrative Studies, Reported by: 

Robert B. ICamm, Dean of Students, Drake University 
Edgar Z. Friedenberg, Adviser, University of Chicago 

SOCIAL HOUR 

Hostess. Anna M, Hanson 

Director of Placement, Simmons College 

SECTIONAL MEETINGS: 

(These will be Discussion Groups—no planned speeches— 
attendance at each limited to the first 25 people to apply for 
special admission card at Information Desk. Prerequisite for 
obtaining card is willingness to talk on the topic listed.) 

1 . Discussion Leader .. . .John Withal 

Assistant Professor, Department of Education, Brooklyn 
College 

Topic: “To What Extent Should the Use of Test Results Be 
Limited to Qualified Personnel?” 

1 . Discussion Leader. Robert LI, Shaffer 

Assistant Dean of Students, University of Indiana 
Topic: “How Can We as Student Personnel Workers Stimulate 
and Motivate the Student with Higher Ability?” 

3. Discussion Leader.M. Catherine Evans 

Assistant Director of Counseling, University of Indiana 
Topic: “The Use of Sociometric Techniques in Residence Hall 
Work.” 

4, Discussion Leader. Nathan Kohn, Jr 

Registrar, University College, Washington University 
Topic: “Are Freshmen Orientation Courses Desirable?” 









446 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
THURSDAY, MARCH 30 

SECTIONAL MEETINGS; 

1, Presiding. .C. W. McCracken 

Dean of Students, Muskingum College 
Symposium: “Student Activities in Relation to College Person¬ 
nel Work" 

“The Role of Student Government in the Student Personnel 
Program” 

Brother Louis, Dean, St. Mary’s College, Winona, Min¬ 
nesota 

“Student Personnel Work and the National Student As¬ 
sociation” 

Gordon Klopf, Chairman, National Advisory Council, 
N.S.A., University of Wisconsin 

“Contributions of the Student Union to the Total Student 
Personnel Program” 

Donovan D. Lancaster, President, National Association 
College Unions, and Director, Moulton Union, Bowdoin 
College 

2 . Presiding. Robert F. Moore 

Director, Personnel Office, Columbia University 
Panel Discussion: “Reciprocal Contributions of Student Person¬ 
nel and Industrial Personnel” 

Panel Members: 

Donald S. Bridgman, Personnel Department, American 
Telephone & Telegraph Co. 

Forrest H. Kirkpatrick, Dean of Students, Bethany 
College 

Otis C. McCreery, Director of Training, Aluminum Com¬ 
pany of America 

SECTIONAL MEETINGS: 

1 . Presiding. Walter F. Johnson 

Associate Professor, Institute of Counseling, Testing and 
Guidance, Michigan State College 
Symposium: “Selection and Training of College Personnel 
Workers” 

Speakers: 

“Problems and Trends in the Selection for Training of College 
Personnel Workers” 

George A. Kelly, Director, Psychological Clinic, Ohio 
State University 

"Major Issues and Trends in the Graduate Training of 
College Personnel Workers” 

Willard W. Blaesser and Clifford P. Froehlich, 
United States Office of Education 

2. Presiding._. Donald J. Shank. 

Vice President, Institute of International Education, New 
York 







1950 CONVENTION PROGRAM 


447 


Symposium: “Broader Horizons in Personnel Work” 

Speakers: 

“The Employment Outlook for 1950 College Graduates” 
Ewan Clague, Commissioner Labor Statistics, United 
States Department of Labor 

“Aspects of Manpower Mobilization of Significance to College 
Personnel Workers” 

James C. O’Brien, Associate Director Manpower, National 
Security Resources Board 
“Our Stake in the Occupied Countries” 

Harold E. Snyder, Director, Commission on Occupied 
Areas, American Council on Education 
“Plans for the New International Christian University in 
Japan” 

Maurice E. Troyer, Vice President in Charge Curriculum 
and Instruction, Japan International Christian Univer¬ 
sity Foundation 



AMERICAN COLLEGE PERSONNEL ASSOCIATION, OFFICERS AND 

COMMITTEES 

OFFICE US, I949-50 

President: Thelma Mills, Director, Student Affairs for Women, 
University of Missouri 

Vice President: E. H. Hopkins, Vice President, State College of 
Washington 

Secretary: Robert H. Shaffer, Assistant Dean of Students, Indiana 
University 

Treasurer: Marcia Edwards, Associate Dean, College of Education, 
University of Minnesota 

EXECUTIVE COUNCIL, 1949-50 

Gordon V. Anderson, Director, Bureau of Testing and Counseling, 
University of Texas 

Willard W. Blaesser, Specialist for Student Personnel Programs, 
U. S. Office of Education 

Edward S. Bordin, Director, Bureau of Psychological Services, 
University of Michigan 

Daniel D. Feder, Dean of Students, University of Denver 

Forrest H. Kirkpatrick, Dean of Students, Bethany College 

OFFICERS, 1950-51 

President: Thelma Mills, Director, Student Affairs for Women, 
University of Missouri 

Vice President: E. H. Hopkins, Vice President, State College of 
Washington 

Secretary: Robert H. Shaffer, Assistant Dean of Students, Indiana 
University 

Treasurer: Marcia Edwards, Associate Dean, College of Education, 
University of Minnesota 

EXECUTIVE COUNCIL, 1950-51 

Gordon V. Anderson, Director, Bureau of Testing and Counseling, 
University of Texas 

Lyle W. Crott, Director of Student Personnel Services, University of 
Kentucky 

Clifford E. Erickson, Professor of Education, Michigan State 
College 

A. Blair Knapp, Vice President, Temple University 

Donald E. Super, Professor of Education, Teachers College, Co¬ 
lumbia University 

PROGRAM COMMITTEE, 1949-50 

Cornelia D. Williams, Chairman, Associate Professor and Counse¬ 
lor, General College, University of Minnesota 


448 



OFFICERS AND COMMITTEES 449 

Norman Lange, Director of Student Personnel, University of Ver¬ 
mont 

Dugald S, Arbuckle, Director of Student Personnel, Boston Uni¬ 
versity 

John S. Beard, 5835 Kimbark, Chicago 37, Illinois 

Lucile B. Brown, Child Education Foundation, New York, N. Y. 

Ralph B. Bridgman, President, Merrill Palmer School 

Henry J. Cunningham, Dean of Students, Lynchburg College 

Janice A. Janes, Counselor in Occupational Guidance, Stephens 
College 

Victor B, Johnson, Associate Dean of Men, Clark University 

Margaret Ruth Smith, Associate Admissions Officer, Wayne Uni¬ 
versity 

Thomas S. Richardson, Director of Student Personnel, Texas 
Christian University 

Albert S. Thompson, Associate Professor of Education, Teachers 
Colege, Columbia University 

CONVENTION COMMITTEE CHAIRMEN, 1949-50 

James A, McClintock, Director of Personnel, Brothers College, 
Drew University, Local Arrangements 

Willaim M. Wise, Dean of Student Personnel, University of Florida, 
Exhibits 

Helen M. Voorhees, Appointment Bureau, Mt. Holyoke College, 
Information 

Mary D. Bigelow, Chairman of Advising, Stephens College, Meals 

Anna M. Hanson, Director of Placement, Simmons College, Hos¬ 
pitality 

Robert H. Shaffer, Assistant Dean of Students, Indiana University, 
Publicity 

John H. Cornehlsen, Jr., Professor of Education, Department of 
Guidance and Personnel Administration, New York University, 
Meetings 

Clark I. Davis, Dean of Men, Southern Illinois University, Place¬ 
ment. 


ACPA COMMITTEE CHAIRMEN, 1949-50 

Kate Hevner Mueller, Indiana University, Research 
Clifford Houston, University of Colorado, Standards 
George A, Pierson, University of Utah, Nominations 
Lyle W. Croft, University of Kentucky, Membership 
Paul McMinn, University of Oklahoma, Publications 
Ralph A. Carli, Stevens Institute of Technology, International 
Relations 

C. H. Reudisili, University of Wisconsin, Proceedings 
Ralph Bridgman, Merrill Palmer School, Public Recognition 
Wray H. Congdon, Lehigh University, Local Arrangements 



EDITORS’ FOREWORD 


The twenty-third annual meeting of the American College Person¬ 
nel Association was held at Atlantic City from March 27 to 30, 1950, 
in cooperation with the constituent members of the Council of 
Guidance and Personnel Associations. The convention program was 
organized to develop the theme, “The Personnel Profession; Achieve¬ 
ments and Objectives.” Twenty papers were read, eight panel discus¬ 
sions were presented, and two business meetings were held by ACPA 
members during the four-day period. Eighteen of these papers appear 
in this publication of the Proceedings. Two papers were not prepared 
for publication by their authors. The panel discussions held during 
this convention were not recorded for these proceedings. 

On Tuesday, March 28, the members of ACPA participated in the 
program sponsored by the Council of Guidance and Personnel Asso¬ 
ciations. At the morning session President Howard R. Beattie made 
his Annual Report, after which Thelma Mills, ACPA President, and 
the members of her Committee to Consider Unification made an 
important proposal to reorganize CGPA into an International Person¬ 
nel and Guidance Association. At 11:00 a.m. the convention was 
broken down into many small groups where the proposal was explained 
further and discussed freely, The convention then reconvened and 
accepted the recommendation of the Committee on Unification that 
the reorganization proposal be taken back to the members of the 
various Associations for their consideration during the coming year 
and that final action be postponed until the 1951 convention, 

At the “Council Day” luncheon meeting, Mr. Laurence A. Appley, 
President of the American Management Association, discussed the 
subject, “Greater Utilization of the Educator’s Knowedge of Human 
Potential.” In the afternoon, Dr. John E. McGowan, Lecturer in 
Psychiatry at New York and Columbia Universities, addressed the 
convention on the topic, “Psychiatry for Counselors.” Later Mr, 
William Line, Professor of Psychology at the University of Toronto, 
spoke on the subject, “The Scientific Status of Counseling.” The 
papers presented by Mr. Appley and Mr. Line will appear in the 
Journal of the National Association of Deans of Women. 

The American College Personnel Association members present at 
the convention were informed by the Membership Chairman, Mr. 
Lyle Croft, that our organization is now approaching a total mem¬ 
bership of one thousand college personnel workers. With this increase 
of almost three hundred associates during the past twelve months, 
we are looking forward to another successful year and to the twenty- 
fourth annual meeting of the Association which will be held at Chicago 
March 26 to 29, 1951. 

George A. Pierson 

University of Utah 

Catherine M. Northrop 
University of Denver 

45 ° 



DEVELOPMENTS IN COUNSELING BY FACULTY 

ADVISERS 


(An Abstract) 

CARROLL L. MILLER 

Assistant Dean of the College of Liberal Arts, Howard University, 
Washington, D. C. 

Significant among the recent trends in higher education 
is a growing recognition of the obligation of the college or 
university to each student accepted for admission. One result 
of this development is an increased awareness of the need for 
“individualization” and the necessity for expanding the 
facilities for handling the entrant as a person. 

The organization of these services may vary from institution 
to institution, hut the aim is basically the same; namely, to 
assist in the development of the potentialities of the individual 
within the framework of the philosophy of the school. For the 
realization of this aim, the college relies in part on its counseling 
and advisory facilities, which normally include the services of 
faculty advisers, residence hall counselors, and specialists in 
counseling and clinical techniques. 

The success of any program of counseling in college depends 
in a large measure upon the effectiveness of faculty advisory 1 
services, for the bulk of the counseling problems on a campus 
are those needing educational guidance, and the faculty 
adviser is frequently sought by the student when questions 
relating to academic matters arise. 

In order to determine the role played by faculty advisers 
in the student personnel programs of institutions of higher 
learning in the United States, a Questionnaire was sent to 115 
selected colleges and universities. Replies were received from 

‘The term, faculty adviser, is used here to refer to the general adviser rather than 
the major field adviser. It is felt that the results of an effective faculty advisory service 
during the freshman and sophomore years will decrease to a minimum the problems 
for major field adviser in subsequent years, 



4 p EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

90 of these schools. 2 The instrument was devised specifically to 
discover (1) the methods used to select faculty advisers; (2) 
the services performed by faculty advisers; (3) the methods of 
orienting and training faculty advisers. 

Faculty advisers were available in 86 of the 90 schools 
reporting. These advisers were selected by the individuals or 
groups listed below: 


Selections made by ’Humber of Schools 

Dean of the College. 24 

Heads of Departments and Dean of College. 15 

Dean of College and Coordinator of Counseling. 8 

Heads of Departments. 6 

Coordinator of Counseling. 6 

Dean of Students.. 4 

Dean of College and Faculty Committee. 3 

Heads of Departments and Coordinators of Counseling.... 3 

Board of Advisers. 2 

Dean of Freshmen. 2 

Dean of Men. 2 

Student Groups. 2 

Chairman of General Education Program. I 

Coordinator of Counseling, Dean of Men, and Dean of 

Women. 1 

Dean of College, Dean of Men, Dean of Women, Registrar 

and Director of Guidance. 1 

Dean of Students and Heads of Departments. 1 

Dean of Students for College, Staff, Chairmen, Dean of 

College, Dean of Students for University. 1 

Dean of the University. 1 

Heads of Departments, Dean of the College, and Coordi¬ 
nator of Counseling. 1 

President of the College. 1 

President, Dean and Faculty Committee. 1 

Total. 86 


2 Of the 90 Institutions from which Questionnaires were received 76 were coeduca¬ 
tional; 79 were members of the American Association of Universities; approximately 
half had faculty members belonging to ACPA. These schools were distributed as 
follows: New England States 9, Middle Adantic States 19, Central States 40, Southern 
States 16, Western States 6. 
























DEVELOPMENTS IN COUNSELING 453 

In selecting advisers four characteristics were taken into 
account and were reported as follows: 

Genuine interest in and understanding of students—men¬ 
tioned 73 times. 

Willingness to take time to advise students without addi¬ 
tional compensation—mentioned 14 times. 

Knowledge of course requirements, curricula, and regulations—■ 
mentioned 8 times, 

Interest in total educational program—mentioned 4 times. 

The services performed by faculty advisers in 86 schools 
ranged all of the way from assisting students in selecting 
courses to helping students gain insight into their personal 
problems. The activities reported in which advisers engaged 
are listed below: 


Activities Number Reporting 

Assistance in the selection of courses. 86 

Assistance in long range academic planning for a career. ... 83 

Explanation of academic regulations. 77 

Referrals to other agencies. 70 

Follow-up of academic progress through periodic reviews 

of records. 63 

Exploration of personal problems. 51 

Assistance in securing aids to academic adjustment. 33 

Entertainment (social) of advisees. 1 

Rating of each advisee on citizenship. 1 

Assistance in personalizing freshman week. 1 


Some form of in-service training for faculty advisers was 
provided in 53 of the institutions reporting. Periodic meetings 
in which common problems were discussed was the most 
frequent in-service training method. Other techniques used 
were workshops, case conferences, organized summer courses, 
and faculty adviser’s handbooks. 

In the majority of colleges and universities (75) no reductions 
in teaching load were made to compensate for the time spent as 
advisers. Additional compensation was provided faculty 
advisers by eight institutions; one institution freed advisers 
from committee work; and another institution provided 
additional compensation and reduced the teaching load. 












4j4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

A few colleges and universities have made definite efforts to 
improve their faculty advisory services, Among these are 
Stephens College, where an elaborate adviser’s training program 
is in effect; Ohio State, where advisory services have been 
centralized; Colgate, where graduate students are used to 
supplement the services of faculty members; and San Francisco 
State, where an instructor-advisory plan is now in operation, 

While there is greater concern for the welfare of the individual 
college student today than was true a generation ago, indiffer¬ 
ence still characterizes the efforts of many faculty advisers, 
Among the problems yet to be solved are the following: How 
can faculty advisers be used most effectively? That is, how 
can their services be made a part of the student personnel 
program of the institution? What are the personal character¬ 
istics of an effective adviser? How important is training in 
developing an effective faculty adviser? To what extent should 
faculty advisers attempt to counsel students regarding their 
various adjustment problems? And, finally, what consideration 
can and should be made to compensate faculty advisers for 
their additional responsibilities? 



DEVELOPMENTS IN RESIDENCE HALL COUNSELING 

MERLE M. OHLSEN 

Associate Professor of Education, Washington State College, Pullman, Washington 

Have you been following the professional literature which 
has been written on the topic of residence-hall counseling? 
If you have followed it carefully over the last twenty years, 
you have found that it has not consumed much of your time. 
It is true that writers in the field of student personnel work do 
mention the topic occasionally. They usually agree that the 
residence-hall program has an important place in the student 
personnel program. 

In preparing this paper it occurred to me that there is one 
general objective of dormitory counseling. It is to help the 
student to better understand himself and his relations with 
people through his day-to-day contacts with interesting and 
friendly individuals who can work and plan with him. The 
purpose of this paper is to consider some of the issues involved 
in achieving this broad objective. Specifically, the following 
issues will be considered: 

i. How are present dormitory counseling services affected 
by the historical developments in student housing? 

1 . How does the dormitory staff fit into the general frame¬ 
work of counseling services? 

3. What are some of the services which the dormitory 
counselors can provide? 

Let us consider these issues in the order in which they were 
stated. Stewart 1 reported that the problem of student housing 
dates back to the very beginning of the great European Uni¬ 
versities. This fact in and of itself is not so important, but her 
account of the gradual shifts in the student’s role in house 
government does have a direct bearing upon student-staff 
relationships. She traces the change as follows: <f . . .in the 

1 Helen Q, Stewart. Some Social Aspects oj Residence Halls for College W omen. 
New York: Professional and Technical Press, 1941, p. $. 


455 



456 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

mental flowering and freeing of the Renaissance, residence 
halls were largely student governed; and that little by little 
as learning formalized, authority for the conduction of tile 
life in them was removed from student hands until it rested 
completely with the college authorities.” 4 If we can accept the 
statements of philosophy in present-day dormitory staff 
manuals as an indication of change in practice, it would appear 
that we are now moving in the direction of more democratic 
student-staff planning within dormitories. 

In any case, we cannot treat the development of the phil¬ 
osophy of student personnel work as it pertains to residence-hall 
groups as if it were independent of the rest of the student 
personnel program. Relative to the beginning of student 
personnel work in this country Cowley 3 said that the first 
college dean seems to have given most of his attention to 
disciplinary problems. Now if we recall that the early dean 
often lived in a dormitory as proctor, we see even more clearly 
why there was a staff-dominated relationship. It is probable 
that the pattern which was set in these early programs may 
still plague our dormitory counseling programs today. 

It is not likely that the dormitory counselor does his best 
work if he still holds the “papa or mamma knows best” attitude. 
We need professional leaders who can work with the students 
in helping them make plans rather than leaders who devise 
the plans and attempt to sell them to the students’ elected 
leaders. 


The Problem of Dormitory Staff 

What has just been said brings to the fore the second issue— 
that of dormitory staff. It is a problem to find staff members 
who have the training and the personal security which allows 
them to work with the students democratically. Or me 4 said that 
being a good disciplinarian and “nice woman who loves young 
people” are no longer adequate qualifications for dormitory 
heads. Whereas they may have been adequate qualifications 


2 Ibid., Stewart, p. 93, 

a W. H. Cowley, "Some History and a Venture in Prophecy.” Trends in Student 
Personnel Work, E. G. Williamson (Ed.). Minneapolis: Minnesota Press, 1949. 

4 Rhoda Orme, Counseling in Residence Halls , A Report of a Type C Project Doctor 
of Education Degree, Teachers College, Columbia University, 1948. 



RESIDENCE HALL COUNSELING 


457 


for the housemother’s position, they are not adequate quali¬ 
fications for the position of head counselor. Merely liking 
young people and being kind to them does not qualify the 
dormitory counselor to provide the kind of services which we 
shall consider here. 

It is true that some of the colleges and universities have met 
this problem. However, we still do have many housemothers 
as head counselors. Some schools place teaching faculty in the 
dormitories as head counselors; others use a combination of 
teaching staff and undergraduate assistants. Still others staff 
the living units with graduate counselors. A few employ 
well-trained, full-time head counselors. The full-time teaching 
staff member probably is too busy to give the job the time it 
really takes. Moreover, the job usually demands his attention 
at the time of day when he prefers to be doing something else. 

For the schools which do have the doctoral program, the 
mature doctoral candidate in student personnel, who has 
had personnel experience, appears to be the most promising 
candidate for the head counselor position. First, he has special 
training. Second, he needs the experience and he is motivated 
to do a good job. Third, he will be on the job at least three 
years. His services can be supplemented with upper-class 
undergraduate students. The young graduate student who is 
working on a half-time assistantship rounds out the staff 
nicely. I shall not treat either the problem of the number of 
staff members needed in a dormitory or the exact qualifications 
each should have. However, I shall define a given dormitory 
situation, describe the staff, and treat the problem of services 
in relation to these factors. 

A Specific Dormitory Situation 

Now let us think about the specific situation. I shall assume 
that the hall houses one hundred students. It has a full-time 
Head Counselor. To assist him in the more specialized services 
he has a Counseling Assistant who works half time in the 
residence-hall program and does half-time graduate work. 
There are also five carefully chosen undergraduate assistants 
who serve without pay. They act as liaison workers between 
the students and the paid staff. The Head Counselor has 



458 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

overall responsibility for the dormitory. Naturally, it is up to 
him to develop a training program for his own staff. Not 
only has he the responsibility for training the six staff members 
described above, but he is also responsible for helping each 
of the dormitory officers to define and learn how to carry out 
the duties of his office. 

It is obvious that we should know more about the given 
situation before we attempt to plan a counseling program for 
it. We should have more information about the students who 
live there, the kinds of people the individual members of the 
staff are, and the arrangement of the dormitory itself. But 
to know that these elements are important suffices for our 
purposes here. 

In passing, something should be said about behavior prob¬ 
lems. I believe that we should help students to take responsi¬ 
bility for their own actions. If a student is accused of breaking 
the social code for the house, his case should come to the 
attention of the Head Counselor. He, in turn, would help 
the house officers to collect the facts about the alleged violation. 
On the basis of the facts, the House Council would make a 
decision on the case. Should they decide that the case is some¬ 
thing which is too difficult for them to handle, the student 
would be referred to the student-faculty discipline committee. 
In any case, the house officers should keep detailed notes on 
the case and the disposal made of it. 

Working Relationships 

Since we are thinking about the staff, probably I should 
comment on student-counselor relationships. Even in the 
residence halls in which students really have had a chance to 
experience democratic planning, the feeling between the 
Counselors and the students is different from that in the 
Counseling Center. The dormitory staff member and the 
student are personal friends. The dormitory is the home- 
away-from-home. Hence, we have more of a friend-to-friend 
counseling relationship rather than a clinical relationship. 
Here the friendly staff member tries to help the individual 
students solve problems either individually or in groups. The 



RESIDENCE HALL COUNSELING 


459 


staff member not only tries to help individuals, but he tries 
to set up situations in which students can help each other. 

The whole problem of student-counselor relationship also 
identifies the need for more reflection on the issue of the 
student’s role in the house government. It is my own conviction 
that democratic planning not only helps to create a better 
within-house feeling, but it also stimulates greater personal 
development of the students. If we mean to work with students 
democratically we must trust them and their judgments. 
We must be willing to take chances and even to allow them to 
make mistakes. They must feel that they can settle issues 
through democratic processes and even go ahead to try a 
project which the Head Counselor has verbally opposed. 
This does not mean that the staff leader is not a participating 
member of the group. He is a member of the living group and 
as such he has the right to state his arguments in the case, 
The point is that the staff member should not insist on having 
his way. Granted, some may feel that too much has been made 
of this point, but failure to reach an understanding here often 
seriously affects other staff-student relationships. It is im¬ 
portant that there should be established a feeling of mutual 
trust—an atmosphere in which students and staff can work 
together democratically in creating and maintaining a living 
environment with greatest educational, social and cultural 
values. 


Questions of the Teaching Staff 

It is also important that the Dormitory Counselors learn to 
work with the teaching staff. Many questions have been raised 
by the teaching staff. Suppose we consider just three questions 
which I heard a staff raise recently: 

1. Just what is it that Dormitory Counselors do? 

2. Would we be able to notice any difference in our students 
if these services were discontinued? 

3. Is this the best and most economical way of providing 
these services for students? 

We will just have to admit that we do not have the answers 
to the last two questions now. That means we had better get 



460 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

busy and evaluate our program. We may need these facts all 
too soon. The rest of this paper will be devoted to the first 
question. Just what is it that Dormitory Counselors do? 

Duties of the Undergraduate Assistant 

The undergraduate assistant acts as a liaison between the 
students and the staff. He makes his contribution by providing 
the following services: 

1. By helping students to become acquainted in the house— 
both with the students and the staff. 

1. By becoming well acquainted with every student in his 
section—knowing their special interests, abilities, and problems. 

3. By referring students for help. 

4. By knowing the student resources in the house for special 
tutorial help. 

5. By distributing information which helps all the students 
keep well informed on both house and college-wide activities 
and regulations. 

6. By helping to promote good house government. 

7. By helping to create and maintain a friendly atmosphere. 
Obviously, this undergraduate assistant would soon lose his 
opportunity for real leadership in the dormitory if he ever 
became an inspector for an autocratic Head Counselor. 

8. By recognizing morale problems early—since he works 
with a smaller group of students in the dormitory, he is able to 
help the head counselor understand sources of difficulty. 

The Dormitory Counselor's Services 

We have noted some of the things which the undergraduate 
assistant does. Now what is it that the Head Counselor and the 
Counseling Assistants do to help students? 

1. The Dormitory Counselor should make himself available 
to students when they need to talk to a friend about personal 
problems. Those of us who have worked in dormitory programs 
know that the Head Counselor and Graduate Counseling 
Assistant can expect to be visited any time of the day or night. 
The student who knocks at the door during the night probably 
is too troubled to either sleep or study. He may need no more 



RESIDENCE HALL COUNSELING 


461 


than personal attention at the time when things have gone 
badly. He probably feels the need to talk to a mature friend. 
However, the trained Dormitory Counselor realizes that the 
student may need therapy which goes beyond the scope of his 
job and his competencies. 

1. Students want the Dormitory Counselors to help them 
with their activities. The Dormitory Counselor should do more 
than merely help students with the activities they now have. 
He should try to discover the students’ interests, then organize 
small groups to meet individual needs. Some of these small 
“cell” groups give a student a chance to achieve a measure of 
security which in turn helps him find and become affiliated 
with campus-wide activities. 

3. Social programs also provide the staff with another 
chance to help individuals in groups. Such activities as dinners, 
teas and coffee hours, dances, lectures, musicales, and discus¬ 
sions, all are a part of social education. These experiences 
can help the student to learn to live in a group and to appreciate 
some of the cultural values which a college education should 
provide. On the other hand, it is possible for the staff to 
promote a social program which the students neither want nor 
appreciate. Under these conditions little learning takes place. 

4. Inasmuch as the dormitory staff member does have a 
chance to see a student living in a variety of situations, he can 
provide facts about the student which helps others who also 
work with the same student. Dormitory staff members often 
pick up information about the student's family, his personal 
problems, health, study skills, special learning problems, and 
study conditions within the house. Some of these facts which 
the Dormitory Counselor discovers also help such special 
college committees as the ones on scholarship and discipline. 

5. Dormitory workers can become acquainted with the 
students who need special help. It is important that the 
Dormitory Counselor not only recognize these students who 
need special help but that he also knows the referral agencies 
and techniques of referral. All of us would certainly agree 
that the Dormitory Counselor must be thoroughly acquainted 
with each agency and its service before he can make an intel¬ 
ligent referral. The referral agency’s staff also has a responsi- 



46a EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

bility for working cooperatively with the Dormitory Counselors. 
Status difference between these two levels of counseling often 
complicates this task. Since Dormitory Counselors are involved 
by the mere fact that they live with the student, they must be 
kept informed about the student’s progress and the part they 
can play in helping him to insure that both of these Counselors 
are giving the student integrated help. 

6. If the dormitory is to become the student’s home-away- 
from-home, then the staff must help to orient him to college. 
Ideally, the orientation to college should be started in high 
school. There should also be a college-wide program for the 
orientation of the new student. Nevertheless, the dormitory 
orientation program can be a vital factor in the student’s 
adjustment to college and life away from home. The dormitory 
staff should help the new student become acquainted with 
other students and the college program. 

7. The exit interview is another natural service of the 
Dormitory Counselor. Inasmuch as students do frequently drop 
out of school before they adjust to college work this is certainly 
a needed service in the dormitory. We should accept the 
student’s decision to drop school and to allow him the freedom 
he needs to talk out his decision. The very fact that we accept 
his decision to drop out and try to help him plan for the future 
often causes him to change his plans and stay in school. This is 
particularly true when his long-term plans do involve college 
training. 

8. Another problem of adjustment to college is the one of 
quality of scholarship. Actually, there may be as many as four 
elements in this problem for students: (1) developing good 
study conditions in the dormitory, (2) helping students to 
budget their time efficiently, (3) giving assistance in developing 
good study habits and study methods, and (4) improving 
reading skills. Of these four elements the dormitory staff can 
often help with the first three,but they will usually refer the 
students to the reading clinic for the fourth service. 

9. And, finally, there is one other large area of service in 
which Dormitory Counselors may give help—in educational- 
vocational planning. It is true that the teaching faculty should 
do the academic counseling, and that careful vocational 



RESIDENCE HALL COUNSELING 


4 6 3 


appraisal should be made with the help of a Clinical Counselor. 
Even so, the students do talk to the Dormitory Counselors 
about individual courses and fields of study. Hence, the 
dormitory staff member should have vocational information 
available to him. He also needs special job information on the 
fields of study available to students at the college. On the other 
hand, the dormitory staff should also refer the student to the 
college’s vocational information library. He certainly will 
want to refer some of the students to a more specialized 
counseling service for testing and counseling. 

Then, there are certain counseling services which Dormitory 
Counselors can provide. Obviously, not every dormitory staff 
will be able to provide help in all of these nine areas. The 
services the staff in a particular residence hall provides must 
be determined by the quality of the staff and the services 
provided by the other student personnel agencies. And since 
the residence hall program is just one part of the whole college 
program, I decided to conclude this paper with four questions 
for which answers are still needed: 

i. What are the in-service training needs of your Dormitory 
Counselors? 

i. Are we making use of the personnel techniques developed 
by other agencies and are we adapting these techniques for 
use in residence hall programs? 

3. Is this the best and most economical way of providing the 
counseling services defined in this paper? 

4. Would the teaching staff notice any change in the students 
if dormitory counseling services were discontinued? 

REFERENCES 

1. Borreson, B. J. “Student Housing as Personnel Work, 1 ’ Trends 

in Student Personnel Work , E. G. Williamson (Ed.).’ Min¬ 
neapolis: Minnesota Press, 1949. 

2. Cowley, W. H. “Some History and a Venture in Prophecy.” 

Trends in Student Personnel Work, E. G. Williamson (Ed,). 

Minneapolis: Minnesota Press, 1949. 

3. Dammon, A. H. “Residence Halls for Students,” Trends in 

Student Personnel Work, E. G. Williamson (Ed,). Min¬ 
neapolis: Minnesota Press, 1949. 

4. Hayes, Harriet. Planning Residence Halls. New York: Bureau of 

Publications, Teachers College, Columbia University, 193a. 



464 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


5. Lind, Melva. '‘The College Dormitory as an Emerging Force in 

Education.” Association of American College Bulletin , XXXII 
(1946), 529-538. 

6. Lloyd-Jones, Esther McD. and Smith, Margaret R. A Student 

Personnel Program for Higher Education. New York: Mc¬ 
Graw-Hill Book Company, 1938. Chapter XII. 

7. Orme, Rhoda. “Counseling in Residence Halls.” An Unpublished 

Report of a Type C Project—Doctor of Education Degree 
Teachers College, Columbia University, 1948. 

8. Residence Halls for Women Students. Washington 6, D. C. 

National Association of Deans of Women, N.E.A,, 1947 

9. Sifford, C. S. “Evaluating a Residence Hall Counseling Program,’ 

School and Society, XIX (1949), 452-453. 

10. Sifford, C. S. “Residence Hall Counseling.” College and University 

Business , III (1947). 

11. Survey of Land-Grant Colleges and Universities. Bulletin 1930 

No. 9, Volume I. Washington, D. C.: U. S. Office of Edu 
cation. 

12. Stewart, Helen A. Some Social Aspects of Residence Halls jo; 

College Women. New York: Professional and Technical Press 
1942. 



DEVELOPMENTS IN COUNSELING BUREAUS AND 

CLINICS 


ROYAL B. EMBREE 

Assistant Director, Counseling Bureau, University of TexaSj (Paper read by Gordon 
Anderson, Director, Counseling Bureau, University of Texas) 

Introduction 

For many years it has seemed to the writer that the first 
need in any speech or article dealing with student personnel 
work is for a clarification and definition of the very title 
itself. This notion was reinforced by Dr. Cowley’s justifiably 
choleric variation upon the semantic theme in the Minnesota 
publication Trends in Student Personnel Work . 1 Therefore, the 
beginning effort in this paper will be aimed at the provision 
of some basic premises for the consideration of "Developments 
in Counseling Bureaus and Clinics.” 

One of the most striking and productive phases of the 
personnel-guidance mental hygiene movement during the past 
two decades has been the establishment of a large number of 
comprehensive agencies, often on college and university 
campuses, which were designed to provide professional as¬ 
sistance to people through the channels of self-appraisal 
and counseling. These organizations, whether they arose 
under the sponsorship of the community or of an educational 
institution, have made a tremendous contribution to the 
meeting of individual developmental needs, not only through 
their direct service to people, but also through their emphasis 
upon professional training of staff members, scientific meth¬ 
odology and fundamental research. An effort will be made 
in the following section to trace the origin and growth of 
centralized psychological agencies in colleges and universities. 
The important points to consider now are the facts that 
(i) these agencies developed with a wide variety of titles, 

1 Cowley, W. H. “Jabberwocky Versus Maturity.” Trends in Student Personnel 
Work. Minneapolis: University of Minnesota Press, 1949. Pages 342.-349. 

465 



466 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

and (2) these agencies developed in direct response to the 
evident needs of their potential clientele and not as planned 
aspects of total institutional student personnel programs. 

This paper will be confined to a consideration of counseling 
agencies which have been developed by colleges and universi¬ 
ties. There has been little agreement with respect to the 
names given to these organizations. An opportunity to study 
this matter of nomenclature was provided by the excellent 
directory of counseling agencies recently released by the 
Ethical Practices Committee of the National Vocational 
Guidance Association. 2 Fifty agencies sponsored by institutions 
of higher learning were included in the Directory: Of this 
number, twenty-three, or nearly half, used the term center 
in their listed titles. The next most popular designation was 
service, used by six institutions. Four listed agencies had titles 
which included the word bureau. In three cases, no title was 
stated. Other descriptive titles and their incidence were 
as follows: department —3, clinic — 2, office — 2 , division —2, 
unit — 2, laboratory — 2 , and institute —I. 

The counseling agencies listed in the Directory included 
many, but by no means all, of the more active and better-known 
organizations in the colleges and universities of this country. 
It is clear that the terms used in the title of this paper are 
among the less popular ones and that preference is tending 
overwhelmingly toward the use of center in the description of 
these counseling agencies. It seems reasonable to predict that 
this preference will continue, since center has been very widely 
used in describing facilities for the counseling of veterans 
which are rapidly being converted into general college 
counseling organizations. 

The Directory also provides some interesting information 
concerning the second major point made above. Only seven 
of the fifty listed agencies appear to restrict their clientele 
to the students of their parent institutions. (It is obvious that 
agencies which do so limit clientele would be less likely than 
others to list themselves in the Directory .) Approximately 

•Ethical Practices Committee, National Vocational Guidance Association, rgst 
Directory oj Vocational Counseling Agencies. St. Louis, Missouri: Washington Uni¬ 
versity, I9JO. 98 p. 



COUNSELING BUREAUS AND CLINICS 


467 


60 per cent of listed centers are open to adolescents and adults 
outside the institution and about 20 per cent are open to 
outside clients of all ages and levels of schooling. The median 
listed fee for non-institutional cases falls between s $2o and $25. 
Twenty-seven of the fifty agencies indicate that they counsel 
veterans under contract with the Veterans Administration. 
It is clear that the majority of these centers have been de¬ 
veloped to serve the needs of a clientele extending well beyond 
the limits of the institutions which sponsor them. This extension 
of facilities represents an important public service, but, by 
strict interpretation, it carries the counseling service beyond 
the logical limits of a student personnel agency. On the other 
hand, however, thirty of these listed centers provide free 
service to the students of their parent institutions, indicating 
that they have been developed, at least in part, to meet 
intramural student needs. 

This prevalent dualism in collegiate counseling centers 
raises an important point. Many of these organizations are 
actually student-personnel facilities to only a partial degree, 
and this is especially true of some of the most extensive bureaus, 
centers and services. Other functions such as clinical work 
with children, general adult counseling, industrial consultation, 
examining, test-scoring and educational research may well 
occupy the greater share of the agency’s time and personnel. 

It is proposed that the subject of this paper be reworded as 
’The Central Counseling Facility for Students in Colleges and 
Universities, defined as follows: 

A central counseling facility is an integral part of a student 
personnel program which provides an opportunity for special¬ 
ized counseling by professional workers with access to the various 
technical devices which are being developed in the field of 
counseling, 

Such a facility may be part of a very extensive bureau or 
psychological service center. It may as well be the counseling 
office of a small liberal-arts or junior college, manned by a 
single professionally trained clinical counselor. Actually, there 
may be several central counseling facilities inside the same 
university, each representing a nuclear development within 



468 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

some subdivision of the total institutional structure. The 
size of these centers, and the variety of services provided, will 
cover a broad range and will be conditioned by the character¬ 
istics of the institutions which develop them. The crucial 
point of the concept offered here is that the central counseling 
facility would be recognized by definition as an integral part 
of the total student personnel program of the institution, 
and would be considered separately from the other worthy 
functions often allocated to agencies which render psychological 
services in colleges and universities. It would seem probable 
that such a line of thought should tend to eradicate the rather 
insular characteristics of many counseling centers, thereby 
improving their integration with other aspects of the institu¬ 
tion's total program of services to students. 

Fhe Origin and Growth of Central Counseling Facilities in 
Colleges and Universities 

Counseling centers in colleges and universities have tended 
to develop around the interests and stimulations of certain 
individuals and, in most cases, have been organized well in 
advance of the growth of generalized student personnel pro¬ 
grams in their parent institutions. The result has been a 
widespread effort to meet individual needs, institutional and 
otherwise, by providing the best possible services in the 
areas of self-appraisal through measurement and/or the 
counseling of individual clients. 

Perhaps the most satisfactory framework for considering 
the development of these counseling centers has been provided 
by E. G. Williamson in the first chapter of his book, Counseling 
Adolescents ? He proposes that the two great emphases upon, 
counseling to date have been (i) counseling as a vocational 
guidance and (2) counseling as psychotherapy. A tracing 
back of the factors involved in the development of central 
counseling facilities in institutions of higher learning will show 
that they have tapped these two principal sources. 

The emphasis upon vocational guidance was apparent 
in the organizations developed in communities and school 

3 Williamson, E. G. Counseling Adolescents. New York: McGraw-Hill Book Com¬ 
pany, 1950. 548 p. 



COUNSELING BUREAUS AND CLINICS 


469 


systems to meet needs In this area. The early period of organi¬ 
zation has been effectively described by Reed. 4 Probably 
the earliest establishment was the Vocational Bureau of Boston 
in 1909 under the direct influence of Frank Parsons. This 
type of service was reproduced many times in other school 
systems, and, shortly after the close of World War I, there 
were numerous people in colleges and universities who wished 
to make this vocational-educational service available to 
students in general. These people were usually employed 
by departments of psychology or educational psychology 
and thus it happened that the vocational and educational 
services in which they believed tended to develop within the 
confines of these departments. 

The emphasis upon personal problems and therapeutic 
counseling has also exerted a great influence upon the develop¬ 
ment of central counseling services in colleges and universities. 
Members of psychology departments, and especially clinical 
psychologists, were concerned at a very early date with the 
individual emotional and developmental problems of college 
students. Their efforts to meet needs in this area were crystal¬ 
lized under departmental sponsorship and often grew into 
independent central counseling facilities. In a few cases, 
leadership in personal counseling originated with and was sup¬ 
ported by the student health service of a college or university. 

A few specialized references may provide body and color 
to this discussion. In 1934, Williamson reported on the organi¬ 
zation of the University Testing Bureau at the University of 
Minnesota in 1932. 6 He described how the Bureau was de¬ 
veloped to meet the increasingly complicated needs of students 
and he outlined the philosophy and procedure of the service in 
clear detail. He reported that 1,932 cases had been handled 
during the period 1932-1934, and that these individuals 
represented a reasonably random sample of the university 
population, This central counseling facility grew out of the 
interest and stimulation of Donald G. Paterson who brought 

4 Reed, Anna Y. Guidance and Personnel Services in Education. Ithaca, New York: 
Cornell University Press, 1944, 496 p. 

B Williamson, E. G. “Biennial Report of the University Testing Bureau, 1931-1934” 
P. 343-351, Report of the President for the Biennium 1932-34. Minneapolis: University 
of Minnesota, 1935. 



470 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

his war-sharpened breadth of thinking to the Minnesota 
campus.® 

Another type of development is represented at Ohio State 
University. Stogdill, who was a member of the Psychology 
Department, has reported upon the treatment of cases which 
dated back into the 1920's. 7 The writer can vouch personally 
for Dr. Stogdill’s excellent work and he recalls having sat in 
on a case of hypnotherapy handled by Dr. H. H. Goddard, 
but he remembers that student services were far from integrated 
since he also worked during 1930 with Dr. Louella Cole in a 
program designed to assist students in the improvement of 
reading and study habits. The clinical approach to student 
problems at Ohio State moved into the era of Rogers and still 
exists as a distinctly parallel facility to the Occupational 
Opportunities Service which is more clearly orientated to 
educational-vocational problems. 

McKinney has described the foundation and development 
of the "College Adjustment Clinic” at the University of 
Missouri. 8 This agency was developed about 1938 as an 
outgrowth of the Student Health Service. It is understood 
that it exists at present in tandem with a central counseling 
facility of a clearly educational-vocational nature which was 
developed to meet the demands of veteran advisement. The 
same dichotomy of emotional and vocational-educational 
services may also be found at the Universities of Chicago and 
Oklahoma, and elsewhere in the country. 

A more comprehensive service is described by Bailey, 
Gilbert and Berg at the University of Illinois. 8 This central 
counseling facility was designed from the beginning to utilize 
the services of clinical counselors and also of trained faculty 
counselors who were detailed to educational-vocational work 
with students. 

‘Williamson, E. G. Trends in Student Personnel Work, Minneapolis: University of 
Minnesota Press, 1949. 417 p. 

7 Stogdill, E, Li “A Survey of the Case Records of a Student Psychological Con¬ 
sultation Service Over a Ten-Year Period," Psychological Exchange , III (1943), 129-133. 

‘.McKinney, Fred. “Four Years of a College Adjustment Clinic. I. Organization of 
Clinic and Problems of Counselees.” ‘Journal oj Consulting Psychology, IX (194 s), 
203-117, 

* Bailey, H. W., Gilbert, William M. and Berg, Irwin A. "Counseling and the 
Use of Tests in the Student Personnel Bureau at the University of Illinois.” Educa¬ 
tional and Psychological Measurement, VI (1946), 37-60. 



COUNSELING BUREAUS AND CLINICS 


471 


In conclusion, it may be pointed out that the development 
of central counseling facilities in colleges and universities 
has resulted from an emphasis upon vocational guidance, upon 
personal counseling, or upon a combination of these two 
factors. The actual patterns of development in most cases 
have been highly individualistic—dependent upon the per¬ 
sonalities and viewpoints of the principal influencers of growth. 
The relative newness in the student personnel scene of college 
counseling services and their close identification with the 
persons who founded and developed them account for the 
wide variations which exist today in matters of philosophy, 
function and policy. 

Present Trends in the Development of Central Counseling 
Facilities in Colleges and Universities 
The most striking trend in the development of counseling 
services in colleges and universities is the rapidity with which 
these agencies are being activated on the campuses of this 
country. It was mentioned above that the 1950 Directory of 
Vocational Counseling Agencies listed fifty counseling facilities 
sponsored by institutions of higher learning. In a few moments, 
the writer was able to think of twenty-five active college 
counseling services which he knows of personally and which 
were not included in the Directory. Surely, there are many 
more. If the broad definition given for a central counseling 
facility be accepted, one could add to the list a large number of 
strictly intramural but professionally manned offices in smaller 
colleges and universities. The development of these counseling 
centers and services represents one of the most active areas of 
student personnel work. 

1 Probably no one factor has contributed more to the expansion 
of college counseling services than the Veterans Administration 
College and University Guidance Program. The implications 
of this extensive subsidization of counseling facilities for 
veterans on college campuses were discussed by Dreese at the 
1949 meeting of the American College Personnel Association. 1 ’ 
He reports that there were 415 centers in cooperating institu- 

111 Dreese, Mitchell. "Present Policies and Future Plans of College Guidance Centers 
Operating under V. A. Contracts—A Survey of the American Council on Education.” 
Educational and Psychological Measurement, Part II, IX (1949), 158-578. 



472. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tions at the peak of the program and that an es timated i ,000,000 
cases had been counseled by March 1, 1949. He endeavored to 
find the attitude of college administrators toward these services 
and their plans for the centers following the termination of 
government contracts. There seems to be no doubt that the 
V. A. guidance program has been a tremendous stimulus to 
counseling. Furthermore, about four-fifths of 154 institutions 
intended to continue the centers as part of their college person¬ 
nel programs even though only half of these schools had 
maintained a central counseling facility prior to the estab¬ 
lishment of the V. A. service. 

Another very vital trend is the rapid professionalization 
of the staffs of counseling services in colleges and universities. 
The position of clinical counselor has been clearly defined and, 
in some institutions, is officially established in terms of training 
standards, personal qualifications and duties. Reference to the 
above-mentioned Directory indicates that very high standards 
are being maintained by colleges and universities in their 
selection of directors and professional personnel for central 
counseling services. Thirty of the fifty listed college centers 
are led by persons with doctoral degrees and only one director 
was without some advanced degree. These directors included 
thirteen people with ABEPP diploma, twenty professional 
members of N. V. G. A, and some thirty-four Fellows or 
Associates of the American Psychological Association. The 
professional staffs of these centers included approximately 200 
counselors, forty clinical psychologists and 100 psychometrists. 
The Directory provided opportunity to indicate how many 
professional employees were certified (Professional member 
N. Y. G. A,, Associate or Fellow of Division 17, A. P, A., 
diploma of ABEPP, State certification). Approximately 50 per 
cent of the counselors, 80 per cent of the psychologists and 
17 per cent of the psychometrists were designated as certified 
personnel, 

A very important development is represented by the growing 
use of central counseling facilities in the training of graduate 
students who plan to be clinical counselors. Carefully planned 
and supervised internship and practicum experience in coun¬ 
seling centers have become the crowning factors in counselor 



COUNSELING BUREAUS AND CLINICS 


473 


training in a number of institutions. The central counseling 
facility, regardless of size, can also make a real contribution 
to on-the-job training of faculty counselors. In some situations, 
a planned system of rotating faculty members through tours of 
duty in the central service has vastly improved their training 
as semi-professional counselors. 

There is no need to elaborate upon the trend toward increas¬ 
ing emphasis upon student personnel research in college 
counseling agencies. They are admirably situated and ex¬ 
cellently staffed for this purpose. Already, the college personnel 
movement owes a mighty debt to certain of the more estab¬ 
lished counseling services which have produced a large amount 
of highly significant research in connection with their studies 
of students and counseling techniques. There is an unlimited 
future for development in this area, but careful programming, 
and cooperative planning by counseling services must be 
achieved, if optimal results are to be obtained. 

The rapid expansion of special services in college counseling 
agencies is another characteristic of present development. 
Specialized counselors are being provided to assist students 
in such areas as reading, study, human relationships, prepara¬ 
tion for marriage and marital adjustment. This growing 
tendency toward specialization results in a sort of clinical 
approach in which several experts share in the analysis and 
counseling of the individual when this is demanded by the 
situation. The field of counseling has become so complex 
and its literature and techniques so extensive that a certain 
amount of specialization is necessary. However, caution 
should be exercised in this connection for overspecialization 
could dangerously threaten the close personal association so 
important to a satisfactory counseling relationship. 

A final tendency, and a very significant one, is the movement 
toward the improved integration of central counseling services 
with the other phases of the total student personnel program. 
There is much to be done in this area, especially in the case of 
more insular counseling agencies. The task is simpler with 
smaller, more flexible central counseling facilities. This matter 
should be carefully considered by the many institutions which 
are converting their counseling centers for veterans into 



474 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

student agencies, since there will in such instances be no 
established interests and policies to obstruct progress toward 
the integration of all student personnel services. 

The Functions of a Central Counseling Facility 

This paper will be concluded by the presentation of a 
schematic system for outlining the functions of central counsel¬ 
ing facilities in colleges and universities. 

The responsibilities of such a counseling service may be 
represented effectively by a pyramid with a tri-lateral base. 
This sort of diagramming appears reasonable, since the central 
counseling facility can assume a very vital and focal position 
in the total personnel program of a college or university. 
The significance of this position is enhanced by the growing 
tendency to consider counseling as a basic educational process, a 
viewpoint which has recently been strongly emphasized by 
Williamson. 11 

Three functions or services are suggested by the base of this 
pyramid. The first, and perhaps the most important, is Side A, 
which represents direct, personal assistance to students through 
the media of self appraisal and/or counseling. There need 
be little concern regarding this function, for efforts to meet 
individual needs have been a characteristic aspect of college 
counseling services since their origin. 

Side B of the base is the essential function of training. 
There are at least three principal areas of training to which 
the central counseling facility can and should contribute. 
One is the continuous responsibility for stimulating and 
up-grading the staff of the center itself through organized 
programs of on-the-job training. Another is the training of 
various counselors in the institution (usually faculty or resi¬ 
dential) who are contributing to the total job of individual 
work with students. The third is the task of providing an 
opportunity for internship, or practicum experience, for 
graduate students who are specializing in the field of counseling. 
This need will arise only in the larger colleges and universities, 
but when it is possible, the integration of counselor-training 

11 Williamson, E, G. Counseling Adolescents. New York: McGraw-Hill Book Com¬ 
pany, 1950. 548 p. 



COUNSELING BUREAUS AND CLINICS 475 

and central counseling activity can make a real contribution 
to both training and counseling. 

Side C of the base is the function of planned assistance to the 
other agencies in the institution which are engaged in the 
general task of counseling students. This represents the most 
neglected side of the figure. Little progress can be made in this 
direction until the problems of general integration mentioned 
above have been worked out. However, it is obvious that the 
central counseling facility, through its access to personnel 
data and through the insights and experiences of its staff, 
can render invaluable assistance to other counselors in the 
institutional personnel program. 

It is proposed that the altitude function of this pyramid be 
considered as deliberately planned and programmed research. 
The scientific study of students, and of the efficacy of methods 
used to assist them, will give body or volume to the entire 
program suggested above. Research may be directed at any or 
all of the three basal functions outlined: service to students, 
training, or assistance to the general and non-professional 
staff of counselors. The absence of this research emphasis 
reduces the central counseling facility to a plane surface, 
without body or volume. The applications of research can 
vastly enrich any of the approaches which are made to serving 
the three functions of the central counseling facility which 
have been outlined here. 

In conclusion, it may be stated that the maximal value of 
central counseling facilities can be attained from the filling out 
of the pyramid suggested in this paper. There is nothing about 
the representation which needs to be conditioned by the size 
or number of employees of central facilities. The small central 
facility, manned by one counselor, can fill out the pyramid as 
effectively as the great college counseling center. The important 
facts are that the central counseling facility should contribute 
to (1) service to students, (2,) training, and (3) assistance to 
extra-center personnel who are counseling students, and that 
there should be a dominating scientific approach to all that is 
undertaken in these areas. 



Presidential Address 
NO VAIN IMAGININGS 


THELMA MILLS 

Director, Student Affairs for Women, University of Missouri 

There is a fable about an ancient King, who, troubled by 
the economic woes of his people, called upon the economists 
of his kingdom for advice. Confused by their conflicting 
theories and counsel, he commanded them,to prepare a short 
and simple text on economics for him. After many months 
they brought him many volumes replete with charts and 
graphs. In fury, the King banished half of the economists and 
commanded the other half to produce a text which he could 
understand. One after another they made reports that went 
over his head, and one after another they went into exile. 
Finally, all but one economist was gone. In fear and trembling, 
this last economist appeared before the King. “Your Majesty,” 
he quavered, “I have reduced this subject of economics to a 
single sentence. In nine words I will reveal to you all the 
wisdom to be distilled from all the economists who once 
practiced in your realm: “THERE IS NO SUCH THING AS 
A FREE LUNCH !” 1 

As I speak to you today I am much like the last economist 
because, by asking the guests at the head table to join us and 
pay for their own luncheon, I have proved to them that ACPA 
economics is no less rigorous. In another way I resemble the 
last economist, because I have set for myself the task of 
presenting a composite picture of the aims and aspirations 
of the presidents of ACPA during the past two decades. From 
the study of these reports came my title, “No Vain Imaginings,” 
for I found that not only were they sound in their thinking, 
but, also, profound. They did not vainly hope for their plans 
to be made realities, as you, too, will see in the next minutes of 

1 From an article in Sttelways by William J. Grey. 

476 



PRESIDENTIAL ADDRESS 


477 


presentation. So settle back and prepare to enjoy a family 
reunion where we again gather together after the wars (personal, 
professional and actual) to evaluate what we have been doing. 
A reunion always calls for introducing some of the older 
members to the newer arrivals in the family circle, as well as 
to the guests, and our ACPA reunion for the fourth time at 
“Our” Atlantic City palatial home is no exception to the rule. 
To introduce all of the new family members to the old would be 
impossible, for our family has grown from a recorded ninety in 
February, 1932, to 894, paid as of March, 1950. May I recognize 
the 16 who are still active members of the Association: 


Fredericka Belknap 
Don Bridgman 
A. J. Brumbaugh 
Frances Camp 
M. D. Helser 
J. A. Humphreys 
Esther Lloyd Jones 
Forrest Kirkpatrick 


James McClintock 
Harriet E. O’Shea 
Luther Purdom 
Helen Voorhees 
Edith Weir 
Mary A. Wegner 
Lewis Williams 
Robert Woellner 


We came of age with our 21st annual meeting in 1948, and 
so the following year our president, C. Gilbert Wrenn, had us 
analyzing ourselves to see whether in our adult life we were 
socially effective personnel workers. In "The Fault, Dear 
Brutus,” he asked us to discuss with him the psychological 
problems and temptations of college personnel workers and to 
think of some of the possible solutions. Now I am sure that our 
sixteen long-term members must have met the first of his 
prerequisites to real maturity, “have fun from our associations 
with people,” or they would not be here today, nor members, 
continuously, of the Association. 

Now let us turn our attention to the Association, ACPA, 
and see how it has accomplished the hopes and aspirations 
of its twelve presidential leaders through the years. I should 
like for the record to mention them and their schools. 


1923-25 May L. Cheney University of California 
1925-27 Margaret Cameron University of Michigan 
I 9 a 7 - 3 ° Francis F. Bradshaw University of North Carolina 
J- E. Walters Purdue University 

I 933“35 Karl Cowdery Stanford University 

I 93 S - 37 Esther Lloyd-Jones Teachers College, Columbia 



478 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


1937-39 A. J. Brumbaugh 
1939-41 Helen Voorhees 
1941-44 E. G. Williamson 
1944-47 Daniel D. Feder 
1947-49 C. Gilbert Wrenn 
x 949-5 1 Thelma Mills 


University of Chicago 
Mt. Holyoke College 
University of Minnesota 
University of Illinois 
University of Minnesota 
University of Missouri 


Twelve presidents, and may I call your attention to the fact 
that five of them have been women, elected by popular vote. 
This delineation of presidents has been for the record so that 
the younger members of the Association may have a ready 
file of reference. 

May I review far you, from the reports, what these repre¬ 
sentatives of yours hoped for and did accomplish. In February, 
1923, a group of persons interested in placement met in Chicago 
and, as a result, organized in 1924 the National Association of 
Appointment Secretaries with 79 members. From the be¬ 
ginning it was recognized that placement was only one phase of 
personnel philosophy and practice. The “personnel idea” was 
spreading in colleges, and the Appointment Secretaries’ 
Organization seemed the logical one to help pioneer in a 
growing program. Thus, a committee was appointed in 1926 to 
work with other groups, including the National Vocational 
Guidance Association, the National Association of Deans of 
Women, the Department of Superintendents, the Personnel 
Research Federation, and the National Committee of Bureaus 
of Occupations, in the planning of joint meetings. A community 
of interests rather than any thought of merging into an over-all 
organization brought these early leaders together. 

In 192.9, the name was changed from National Association of 
Appointment Secretaries to National Placement and Personnel 
Officers. In 1930, in Atlantic City, a new constitution was 
proposed and the following year in Detroit the name was 
changed to American College Personnel Association. The new 
constitution was adopted and sectional divisions were set up 
in Educational Counseling, General Placement, Personal 
Counseling, Records and Research, and Teacher Placement. 

The 1932 annual meeting was devoted to a “Study of 
Personnel Activities in Members of the Association.” Here I 
must interpolate that “institutions” were the first members, 



PRESIDENTIAL ADDRESS 


479 


hence a study of personnel activities in members was in no 
way a “Dies Committee” hunt nor an F.B.I. investigation. 
The declared purpose of the Association, was to increase the 
number o i departments of personnel in Colleges and Universities 
by offering free advisement with ACPA officers. Ninety-five 
Colleges and Universities were members of the Association and 
seventy-one of them returned the data which were used for 
tabulation. May I quote from the paragraph on trends: “of the 
15 college personnel departments expressing a trend regarding 
administration of work of the department, seven indicate greater 
centralization of personnel activities; 12, expressed a trend 
toward better guidance; 11 reported a trend toward more 
general employment work; 6 departments expressed a trend 
toward more and better teacher placement.” 

Three items were of particular interest in the history of the 
Association during 1932,: there was affiliation with the National 
Association for the Advancement of Science; the Annual 
Report was published as a separate publication for the first 
time; and the Association appointed, at the request of the 
U. S. Civil Service Commission, a committee to make a study 
of opportunities for women in government, with Mrs. Chase 
Going Woodhouse as chairman of the committee. 

By 1:933, we had found that prosperity had permanently 
disappeared around the corner. Presiding at the tenth annual 
meeting, Jack Walters described the Minneapolis conference 
as one of quality rather than quantity, with comparatively 
few attending because of depression and reduced budgets for 
traveling expenses. It was a year devoted to the preparation 
of a clearer statement of personnel principles and functions; 
to the establishment of higher standards of professional work; 
and to the search for a practical method of judging the effective¬ 
ness of college personnel services. The trend toward effective 
coordination of associations and agencies interested in guidance 
and personnel continued. Under the inspiring leadership of 
Dr. Harry Kitson a Coordinating Committee met with Dr. 
Keppel of the Carnegie Foundation to seek for ways of unifying 
the ten Associations “through headquarters, cooperative 
planning of programs of research, yearly activities and 
gonventions, and joint publications,” 



480 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

It is interesting to note that the need for a permanent 
secretary was discussed at the 1933 meeting. The present 
need is even more urgent. This is one of our vain imaginings 
because of budget. Until we raise our dues, or increase member¬ 
ship far beyond the now “nearly one thousand,” the acquiring 
of a permanent secretary will remain in the planning stage. 

In 1934 Karl Cowdery stated that the purpose of the year’s 
work was "to approve more cooperative action with guidance 
and personnel groups.” The 84 individual members and iB 
institutional members voted approval of this purpose and 
planned to join the American Council of Guidance and Personnel 
Associations with the following member Associations, four of 
which still remain members: 

* American College Personnel Association 
Institute of Women’s Professional Relations 

“"National Association of Deans of Women 

“"National Vocational Guidance Association 
National Federation of Bureau of Occupations 
Personnel Research Federation 
Southern Women’s Educational Alliance 
Teachers College Personnel Association 
American Association of Collegiate Registrars (Affiliated) 

“"National Federation of Business and Professional Women’s 
Clubs (Affiliated) 

Research was the dominant theme of the 1934 conference. 
The papers were definitely slanted toward “the personnel 
point of view,” and, more particularly, to "individualized 
problems of students.” 

Dr. Grayson Kefauver, of Stanford University, keynoted 
the 1935 convention with his address on "Developments In 
Educational Institutions.” The contrast between the mech¬ 
anistic and individualized philosophies of education was 
sharply drawn. He made it clear that personnel policies should 
be formulated in terms of the latter philosophy. 

As an Association, this was a year for action. Seven thousand 
names of college staff members throughout the country, who 
had responsibility for personnel functions, were contacted to 
further professional solidarity in the personnel field at the 


• Current members of the Council and Guidance Personnel Association. 



PRESIDENTIAL ADDRESS 


48 I 


college level; a formal offer was made of the services of the 
Association to the federal government “in making and executing 
plans for the services of youths between 18 and 2.6 years of age.” 

The theme of the following year (Problems of Personal 
Adjustments in Moral, Religious, and Social Relations) 
reflected the same point of view. J. Hellis Miller, then Associate 
Commissioner of Education in New York State, raised a 
fundamental question. “Is personnel work an adjunct to, or 
is it education itself?” The question was clearly answered. 
Personnel services are not superimposed upon the educational 
process; they are an integral part of it. 

Vice-President Hopkins gave us the follow-up twelve years 
later, in 1948, in his paper on “The Essentials of a Student 
Personnel Program.” The whole-hearted response to an 
individualized philosophy of education, accepting the theory 
first and then putting it into practice, means that it must be 
written into the educational philosophy of each institution 
and considered to be the means of education, not adjunct to 
education. 

As an Association, we voted to continue as a member of the 
ACGPA and request the Council to continue its three com¬ 
mittees, (Research, Publications, and Coordination). Dr. 
J. E. Walter proposed “an investigation into what personnel 
services are being rendered at present in different colleges and 
universities.” He urged that a committee of three be appointed 
to initiate research projects such as (1) the advisability of 
formulating a statement of types of preparation offered for the 
training of personnel workers, (2) the preparation needed for 
college personnel work. (Corrine LaBarre made such a report 
in Columbus, Ohio in 1947.) 

Another action worth commenting upon dealt directly with 
us—the placement needs of our own membership. It was 
proposed that the Chicago Collegiate Bureau be used as a 
clearing house for filling personnel positions and placing 
personnel workers. We are still working on such a program, 
as you will hear on Wednesday. 

The personal element enters into the next step in our history 
for it concerns my own first attendance at an annual meeting. 
New Orleans was the place, 1937 the year. Esther Lloyd-Jones 



48a EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

talked, on “What is this thing called Personnel Work?” She 
defined the more immediate needs of the association as: 

1. a continued effort to clarify the nature and scope of our 
professional field, 

1 , fundamental modification of the constitution to conform 
to our changing conception of the nature of the personnel 
program in higher education, 

3. and continued, careful, patient but aggressive attempts to 
cooperate within the A.C.G.P.A. with the guidance and 
personnel groups in the Council. (Akron). 

This was also the year that W. H. Cowley gave us “A 
Preface to the Principles of Student Counseling,” stating three 
fundamental characteristics of counseling: 

counseling as the personalization of education, 

counseling as the integration of education, 

counseling as the coordination of student personnel services. 

He defined counseling broadly, "seeing the student and working 
with him as a whole person.” No vain imaginings, for again 
we read this as a follow-up on the thinking of A CPA members 
expressed two or three years earlier. 

Our 16th annual meeting was held in Cleveland and with 
Dr. A. J. Brumbaugh speaking on "Personnel Services in the 
Light of Current Trends in Higher Education.” After presenting 
to us the “unitary nature” of the early American college, with 
the basic curriculum, he showed that as time advanced the 
program of colleges became more diversified, both as to scope 
and content, and that "fan like, higher education extended 
wider and wider in more divergent directions.” By the 19th 
century we had denominational colleges, women’s colleges, land 
grant colleges and specialized types like art, business, and 
normal schools, an increase from the 10 unitary colleges before 
the Revolution to the aooo institutions of higher education 
today. Now, after the first of the 20th century this elective 
system is indicted in many quarters on the ground that it 
has led to early specialization at the expense of a broad liberal 
education. The assumptions of that period point "to a unified 
and generalized educational experience in direct contrast 
to the specialization that has prevailed.” Just what shall be 
the nature of General Education is still a matter of opinion and 
exper imen tatio n. 



PRESIDENTIAL ADDRESS 


483 


A new movement in quite another direction, from that 
of the return to liberal arts, has developed. The leaders of this 
movement believe that the essential unity of general education 
is not achieved through curriculum, but through the educational 
experience of the individual. Thus, the interests and aptitudes 
of individual students must constitute the focus of this educa¬ 
tion in which they acquire a self-discipline, integrates learning 
with experience, functions creatively in the society in which 
he lives. These two trends both attempt to achieve an essential 
unity in college education, one by way of intellectual disciplines, 
the other by way of individualized educative experience. 

The personnel services provided in any college must be based 
upon the purposes of the college and the needs of the students. 
Some services will be the same in all institutions regardless of 
individual differences because some of the student problems and 
difficulties will be the same, such as: selection of a college, 
variations in student interest and abilities, choice of vocation, 
the social development of the students, the health of the 
student, financial aid. The effective functioning of the intellect 
depends upon many collateral factors, as well as the free and 
disciplined intellect. 

Functionally, as an Association, the year 1939 was memorable 
in our history. The CHARTER for the ACPA became a 
published reality. This charter was drawn by If the Commission 
on Reorganization of the ACPA” appointed in 1937 and 
composed of Esther Lloyd-Jones, Karl Onthank, and G. Gilbert 
Wrenn. Basic to the preparation of the charter was the view¬ 
point reflected in The Student Personnel Point of View, a 
brochure published by the American Council on Education. 

This was the year that a committee with Edith Weir as 
chairman was appointed to write the history of our Association. 
At the preceding convention which celebrated the 15th Anni¬ 
versary of ACPA it was found that only a fraction of the 
members knew the early thought and effort which brought 
about our organization. Thus, it seemed time to review our 
past and to secure, from the early members, the information 
which only they possessed. 

The history was to be a compact record covering the various 
periods of growth from problems of teacher placement to 
the broader personnel phase, and the effort to develop programs 
covering various fields of endeavor with the resulting changes 



484 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

in names. Mrs. Cheney’s forty years of placement work had 
given her invaluable knowledge of early personnel development 
not to be obtained from any other source, and it was felt 
that this should be recorded while she was still alive. Mrs. 
Cheney was given a life membership with full privileges in the 
Association. 

The need for regional groupings was discussed for the first 
time. May I quote: “to organize regional meetings in many 
sections implies a great deal of preliminary missionary work 
on the part of the membership committee.” No action was 
taken. The Committee on Relations with Faculty Advisors 
reported that the Committee had not yet advanced to making 
any recommendations concerning an invitation to faculty 
advisors to become members. The recognition of needs at 
one convention and their implementation at another have 
characterized the pioneering of our organization from the 
beginning. An even more appropriate illustration is found in 
Daniel Feder’s suggestion at the 1940 meeting that we foster 
the establishment of a journal to print research in educational 
personnel and other closely related fields. This did not become a 
reality until 1944. 

The St. Louis Convention was held under the leadership 
of Helen Voorhees. A membership of 239 was recorded. At 
this meeting a panel prognosticated on “The Future of Student 
Personnel Work” with the major issues summarized under (1) 
the functional curriculum and student personnel work, (a) 
the teacher and personnel work, (3) and the need for a strong 
national organization. This seemed to be such a forward 
looking program that I wish to bring the summary, via the 1950 
proceedings (Vol. XVII, p, 19), to you in full. 

The panel discussants included: A. J. Brumbaugh, C. F. 
Malmberg, H. W. Bailey, H. D. Bragdon, D. Stratton, and 
H. H. Moreland. 

The major points of issue are summarized under the following 
headings: The functional curriculum and student personnel 
work; The teacher and student personnel work; The need 
for a strong national organization. 

The junctional curriculum and student ■personnel work: The 
present need for student personnel work in our colleges and 
universities arises largely from a curriculum centered in subject 
matter rather than in student needs. Furthermore, this cur- 



PRESIDENTIAL ADDRESS 


485 

riculum is taught by instructors who are narrowly trained 
in subject matter areas. The functional curriculum, if and 
when we adopt it, will probably preclude the necessity for 
having personnel officers, at least of the same type as at 
present. This point of view is generally held by most personnel 
workers. However, curriculum changes never occur with light¬ 
ning-like rapidity. One discussant who had made a careful and 
exhaustive study of the history of higher education, maintained 
that the functional curriculum will not dominate higher educa¬ 
tion for several centuries. In the meantime, personnel workers 
have much to do. Others hold that a real possibility exists of a 
radical change in higher education. If institutions of higher 
learning do not change from their traditional ways, mounting 
economic and social pressures will force changes. 

The teachei and personnel work: Can teachers be trained in 
the personnel point of view so as to take over a large number 
of functions now administered by personnel officers? One point 
of view maintains that college teachers cannot and will not be 
trained in personnel methods and viewpoints because of the 
nature of graduate training and the traditions of research and 
scholarship. By and large, college faculties are recruited from 
the graduate schools. Graduate training is oriented toward 
research, not toward students or teaching, Furthermore, aca¬ 
demic rewards are not won by the Great Teacher, but by the 
Great Scholar. 

The opposite point of view, held by a large number of 
people, is that all teachers should be trained personnel workers. 
With such additional training, teachers would do a better job 
of teaching and students a better job of learning. While this is 
acceptable in regard to secondary teaching, the adherents of 
the first point of view hold no hope that this can be accom¬ 
plished at the college level. They maintain that student 
personnel specialists would still be necessary even with a 
functional curriculum and with the student point of view. 
They will grant that the faculty may play a role in the instruc¬ 
tional type of personnel work; e.g., remedial reading, how-to- 
study, etc. 

In a few places, faculty members and graduate students 
who expect to teach are taking courses in methods of teaching, 
personal counseling, etc. Summer workshops, such as are offered 
at several centers, are organized to give personnel and teaching 
experience to college teachers. These innovations are unique, 
however. 

The need for a strong national organization: One viewpoint 
maintains that college personnel work will always be a sideshow 
of education unless we have a strong national organization 
which unifies all the branches of student personnel work. The 
opposition states its point this way: Strong national organiza¬ 
tions have a point and push it. Student personnel work is not 
yet ready for such a vigorous program. We are still experi- 



486 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

meriting. Let us not hamper experimentation by adopting 
dogmatic attitudes. This is true not only of personnel work 
but of higher education as well. 

The Council of Guidance and Personnel Associations is one 
attempt at unifying the field and providing a strong national 
organization. Some hold that it is not broad enough, that such 
personnel groups as the registrars associations, the health 
officers, the union managers, etc., should be represented. 
Others state that all college personnel organizations should 
unite into one association, divorcing themselves from organiza¬ 
tions which are made up predominantly of secondary school 
people. 

If representatives of college personnel services want to form 
a unified organization, no blueprint is available. They will be 
obliged to work out their plans in conference, in regional 
meetings, and in group discussions. First, however, they must 
study the problem in terms of the needs of personnel work.’' 

It was at this conference that we also had the first emphasis 
on group dynamics presented. Dr. Ruth Strang reported upor 
her reserch in the field of group work and techniques. 

Work with groups often constitutes a more successful way 
than counseling of attaining empathy with individuals, of 
encouraging them to express their emotional problems, of 
providing constructive outlets for their impulses and of reliev¬ 
ing their tensions and anxiety. . . . Economy is a factor in the 
development of group work. In counseling, needs of individuals 
for certain group activities are discovered. Group activities 
serve as avenues of adjustment, thus they have both diagnostic 
and therapeutic values, 

Atlantic City, February 18-2.2, 1941, the 18th Annua 
meeting! Membership, 256, The Association gave seriou 
attention to new membership requirements, “professional! 
trained persons and other interested, experienced and compe 
tent workers,” The membership approved of "dignified, slot 
expansion and growth.” 

The highlight of this meeting was the presidential address 
President Voorhees spoke to us on “The Responsibilities of th 
Heritage of Personnel Work.” 

We hear much these days, of the advances which have been 
made in personnel methods, but for the time being, I should 
like to look back to the past, to the beginning of personnel work. 
Our predecessors had a rich background in an allied field of 
education; an experience which had given them a firm con- 



PRESIDENTIAL ADDRESS 


487 


viction and belief in the eternal verities. And they had marked 
success in carrying out their educational aims and purposes. 
They transferred their sense of values from the pulpit to the 
field of education. They came to their task possessed of the 
wisdom which comes only from a wide knowledge of human 
nature and its frailties; teaching the virtues which are necessary 
for living and for satisfaction and achievement. 

The purpose of this presidential address was to present some 
aspects of our work which are sometimes forgotten, the spiritual 
values of our profession. 

Great characters, not just great scholars, were produced. 
Men devoted to service, with initiative, self-reliance and 
democratic ideas. 

Have our methods tended to emphasize personality rather 
than the necessity inherent in each of us of becoming a person 
in one’s own right? 

I am eager that one of our openly avowed objectives shall 
be to give the young people in our care some philosophy of 
life which will make it possible for them to get their bearing, 
no matter what happens. 

She was successful both in her presentation and purpose. 

In February, 1942, E. G. Williamson faced an especially 
difficult period of administration. Despite the war, our Presi¬ 
dent led us to think of that future, which lies beyond the 
present, in personnel work. He showed us that anything which 
leads to mote effective conservation or utilization of youth’s 
potentialities actually does contribute to society’s welfare, 
as well as to the winning of the war. Hence, his address on 
“The Future Develops Out of the Past” was a highlight. 

Whatever our professional and personal behavior is as 
personnel workers, one thing is quite clear. Unless we have 
had the benefit of professional training and experience which 
prove to be effective in our handling of post-war problems, 
then we may expect that society, including college students 
themselves, will push us aside and find other types of personnel 
workers or other types of educational workers to handle this 
type of social revolution. The pressure for a solution to these 
problems, the greater articulateness of students and parents 
and the competition for public favor and support from members 
of social and government agencies, will force college admin¬ 
istrators to deal effectively with this anticipated situation. If 
we cannot do the job, then others will be found to do it. 

I believe that we are adequately prepared for the task and 



488 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

that we will make an effective contribution to the conservation 
0/ useful idealism, realism and social and personal values. 

I believe that our contribution will be such as will strengthen 
our place in higher education and will increasingly attract 
able graduate students to secure the necessary professional 
and personal training to make college personnel work a sig¬ 
nificant part of higher education, competing successfully with 
other social welfare professions for the best talents in each 
student generation. 

The numbers that have been entering our profession have 
borne out his faith in the personnel program. 

From 1943 to 1946 the only record of the Association is that 
of the minutes kept by the secretaries and the record of the 
New York meeting in 1943 when only a limited group met 
under the leadership of CGPA. Our officers and members were 
aiding in the promulgation of war programs in every branch 
of the service; those members left on the campus were doubling 
as personnel officers and campus recreational leaders for the 
military services on our campus. A record of these activities 
would be almost a complete history of the war activities. At a 
meeting of the Executive Council held in Chicago in December, 
1945, a Personnel-o-Gram was born, with Fred McKinney 
named as the first editor. The Council attempted to keep the 
membership informed through the media of the Educational 
and Psychological Measurement, published by an ACPA 
member, G. Frederick Kuder. The following ACE brochures, 
for which ACPA members had been chiefly responsible, were 
purchased, and distributed to the membership: 

“Counseling and Postwar Educational Opportunities.” 

“Student Personnel Work in the Postwar College.” 

Active participation in the work of CGPA was continued with 
special attention given to regional conferences. “Judicious 
publicity” was carried on by sending a letter to some 1200 
college presidents concerning the Association and enclosing a 
paper written by Dr. John Darley on “ Counseling and Colleges 
in Post-war Education.” 

By our first postwar annual meeting, held in 1947 at Co¬ 
lumbus, Ohio (moved from Chicago, by consent, when the 
Stevens Hotel would not promise to accommodate all our 
members without discrimination), our Annual Reports were 



presidential address 


489 


resumed and issued as a supplement to Educational and 
Psychological Measurement. During this meeting we were 
definitely interested in post-war personnel services. In his 
presidential address, “When Colleges Bulge,” Dr. Feder 
awakened us to the imperative need of making an immediate 
and reasonably adequate adjustment to things as they are and 
not as they were in the nostalgic good old days. 

The problems discussed were common to all of our campuses: 
(1) the changed nature of campus population, (2) the generally 
changing motivation and orientation of all college students, 
(3) the need for high caliber professional services in vocational, 
educational and personal counseling of all students, (4) special 
problems, caused by previous military treatment of situations 
similar to those in classrooms, (5) ways in which the integrated 
personnel service program may serve both faculty and student 
body in more effectively meeting student needs. New services 
were being offered to students in their quest for maturity. 

President Feder called our attention to the fact that the 
field of student personnel work has suffered from an ill of its 
own making, the tendency to divorce its findings and activities 
from those of the classroom. As a matter of routine, he insisted 
we must transmit to the instructional staff those findings 
regarding student reactions and needs which will assist the 
faculty in the infusion of the realities, meaning, and purpose of 
contemporary life in the classroom. 

In Chicago, in April, 1948, C. Gilbert Wrenn, our “Chief” 
spoke to us on the "Greatest Tragedy in College Personnel 
Work.” It is worth our while to review these tragedies briefly 
for they point the way, just as the meeting, in 1940, on “The 
Future of Personnel Practices,” gave impetus to developments 
of the early forties. Guidance, as a term, was buried, and 
personnel, with its appropriate adjective, was nurtured, so 
that we might speak in common terms with school and non¬ 
school agencies about our concepts, as written into the Charter 
of the Association. Counseling was relegated to its appropriate 
position as one of a number of personnel functions and not the 
entire personnel program. 

Outstanding among these developments are the increased 
participation and consequent demand for professionally 



49 ° EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


equipped personnel workers; increased facilities for personnel 
research; and not least important, an increased humility on the 
part of all of us. 

He was not less concerned to point out pressing problems 
needing to be resolved: 

1. The lack of commonly accepted standards of performance 
and professional preparation. 

2. Students and faculty, who have the most to gain from 
student personnel work, have the least to say about its 
development and emphasis. 

3. Poor coordination of a student personnel program is fre¬ 
quently the result of an incompletely formulated line and 
staff organization. 

4. A student personnel program on a campus tends to be 
isolated from four important influences in the life of the 
student, (a) home, (b) secondary schools, (c) college class¬ 
room, (d) spiritual resources of the campus. 

As a final imagining I wish you to think briefly with me about 
human relationships, a field in which some of us may be 
devoting more of our time than to the more technical areas. 
Certainly this is a field in which we can never become com¬ 
placent with our achievements. The nation’s colleges and 
universities, today, are placing more emphasis on producing 
well-rounded citizens. How would you answer the provocative 
question raised by the Pennsylvania Association of Deans of 
Women, “Do you improve human relationships through your 
guidance services?” Each of us must answer the question! 

When enough people can answer the question in the affirma¬ 
tive we shall indeed have arrived at the place in the personnel 
profession where we do not have to rely on vain imaginings. 
One does not need to sentimentalize the point. There is in¬ 
creasing evidence of a deepening unity among individuals 
and groups devoting themselves to the improvement of human 
relationships through personnel services. 

On a tablet in front of the Old South Meeting House, in 
Boston, are words that describe our Revolutionary forefathers 
as “worthy to raise issues.” They knew which things were 
important and which were unimportant; and a person must 
be mature to raise issues. Most of the small frictions in life, 
human misunderstandings, that destroy mutual confidence 



presidential address 


49 1 


come from raising issues that are not worth raising, and most 
of the social inertias and timidities that keep our world from 
moving toward its ideals express a reluctance to raise issues 
that should be raised. One of our great responsibilities is to 
bring more reasonableness into the human scene, to bring it 
to ourselves and to others. To carry our share of this responsi¬ 
bility we need to see in whole, instead of in part, to be ready 
to act responsibly where responsibility is called for, to forget 
ego and to seek wise understanding of others. '‘Where there 
is no vision the people perish,” and our part is to seek to make 
mature individuals of ourselves and others, that we may bring 
about a community where human beings may realize their 
visions. On a recent drive in my state I passed through three 
neighboring communities, Vista, Fairplay and Humansville, 
which have given me a new philosophy for daily living. “Where 
the vista is right, there will be fairplay in humansville,” and 
you and I must make it come to pass. 



EVALUATION AND RESEARCH IN GROUP 
DYNAMICS 

KENNETH F. HERROLD 

Assistant Professor of Education, Teachers College, Columbia University 

Understanding and evaluation of group dynamics must 
be in terms of the nature of the times in which men live. 
Industrialization has led to the wholesale, and sometimes 
indiscriminate, application of the scientific method to the 
material universe. The phenomenal and, at times, ghastly 
social and technological changes of this century have led to 
collective hysteria in one form or another in all parts of the 
world. The future of social science and, more important, the 
fate of mankind depend upon whether or not the populations 
of the world can adjust their living to the atomic era and live 
and work together intimately and creatively. 

The search for a science of human relations is not new. 
Some have denied that it ever would be possible to apply 
scientific methodology to the processes of human relations 
and at the same time to preserve individual freedom and a 
democratic society. These are legitimate challenges. Others 
believe that we must develop new forms of social discipline 
for interpersonal relations. There has always been appropriate 
skepticism of such suggested social innovations and inventions, 
and there has always been some inappropriate skepticism 
concerning the contributions of social scientists like those 
engaged in group dynamics research. 

Group dynamics is a term usually associated with certain 
concepts and procedures of research and study identified 
with Kurt Lewin and the Research Center for Group Dynamics 
first established at the Massachusetts Institute of Technology 
and later moved to the University of Michigan. However, 
group dynamics has aroused the interest of other social scien¬ 
tists who have never worked intimately or directly with the 
Lewinian group. These social science explorers are contributing 
valuable knowledge to the understanding of how groups 

493 



GROUP DYNAMICS 


493 


behave. Lewin, Lippitt, White and their associates have 
represented a strong team. Their work, such as the Iowa 
studies of the “Social Climate of Groups,” 1 has stimulated 
thought and controversy which has forced many others to 
consider basic problems of group and intergroup behavior 
before they might otherwise have done so. It is necessary and 
appropriate to indicate that responsible research and training 
in group dynamics is now being carried on at Harvard, Min¬ 
nesota, London, California, New York University, Columbia, 
Northwestern, and many other institutions of advanced study. 
There is no question of the status of the staff of the Research 
Center at the University of Michigan. However, to associate 
the development of group dynamics as a respectable field 
exclusively with this Research Center is to limit the progress of 
knowledge and inquiry. 

The turbulence set up by the group dynamics enthusiasts 
has not been universally supported nor accepted. Dean Robert 
B. Browne recently said of group dynamics: 

We are told that we have here something new and basic. 

One would like to remain receptive to what is new and basic 
without prejudice. We are told that here is something scientific, 
from Bethel laboratories and the M.I.T. and Michigan Re¬ 
search Centers. We all have a great respect for scientific 
inquiry and a staunch faith in its usefulness. We are told that 
here is something awfully democratic, and that seems to be 
all to the good. Furthermore, we are assured it’s for leaders, 
which ought to guarantee crowded classrooms where leadership 
training is offered. But just what is this new, basic, scientific, 
democratic leadership training furor, and what is there about 
it that is as yet either new or scientific or democratic or dynamic 
or even useful? 2 

Other words of criticism and challenge have been leveled 
at the proponents of this approach to an understanding of 
group relations. The development of studies of group dynamics 
has been accompanied by considerable misunderstanding, 
misinformation, and erroneous interpretation. The term 
“group dynamics,” therefore, signifies great challenge and 

'Lippitt, R, and White, R. "The “Social Climate” of Children’s Groups.” Child 
Behavior and Development (R. Barker, J. Kounin, and B. Wright, Ed.). New York: 
McGraw-Hill, 1943. 

1 Browne, Dean Robert B. From an address delivered at the annual convention of 
the National University Extension Association held in Edgewater Park, Mississippi, 
May 4, 1949, 



494 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


hope to some, and to others it represents but a transient 
panacea. 

It has been said that the social unit approach to under¬ 
standing of human behavior denies the uniqueness of the 
individual. 3 Need it be dogmatically an “either or” relationship? 
Can we ever begin to meet student personnel needs on a 
strictly individual basis? Those of us who are daily confronted 
with impregnable schedules of individual appointments know 
how difficult it is to achieve even a satisfactory quality in our 
counseling relationships. Professional competence in the use of 
groups and in the analysis of group dynamics can be achieved 
without detracting from the “student personnel point of view” 
and without attenuating the warm and friendly relations with 
students. In fact, the relations of personnel administrators 
with students, especially students in groups, may become even 
more respectable the better we understand the behavior of 
people in groups. 

Misunderstanding of the objectives and procedures of group 
dyanmics is, in part, due to the rapid growth of the field and 
the customary lack of adequate communication which ac¬ 
companies social innovations. It is also due to the inability 
or the lack of opportunity adequately to define the nature of 
group dynamics. This is regrettable. The purpose of this brief 
paper is to attempt to present: (i) one definition of group 
dynamics research and application, (2) a citation of certain 
problems in its developing research and evaluation studies, 
and (3) a prediction of some of the possible applications of 
such training, research and evaluation in college personnel 
administration. 

It would seem wise first to understand what those interested 
and working in the area of group dynamics are trying to do 
before we examine their research and certainly before an 
attempt is made to evaluate their activities. 

With what is group dynamics concernedV 

First, group dynamics is concerned with an understanding 
of the group related factors, forces, and determinants which 

3 Snygg, Donald and Combs, Arthur W. Individual Behavior. New York: Harper 
Brothers, 1949. P. 183. 



GROUP DYNAMICS 


495 


influence individual behavior in groups and the course of 
social change. Lewin described this goal in his statement of the 
‘‘Frontiers in Group Dynamics.” 4 In general, a group is a 
social organism of describable structure and function. In most 
instances the members of a group maintain a face-to-face 
relationship. Group dynamics is also concerned with the 
achievement of an understanding of groups as groups and the 
fundamental laws which govern the behavior of groups. Ob¬ 
viously such studies must rely upon the theoretical and ap¬ 
plied experience of all the social sciences, but especially upon 
social psychology, individual clinical psychology and cul¬ 
tural anthropology. 

Second, group dynamics is concerned with improving the 
application of already established knowledge and skills of 
human behavior to the critical social problems of our times. 
This objective was made explicit in Lewin’s second statement 
on the ‘‘Frontiers of Group Dynamics.” 6 Impatience with the 
manner in which social action has developed led many to 
challenge applied social science in recent years. Sellitz and 
Cook expressed this concern in their inquiry “Can Research in 
Social Sciences Be Both Socially Useful and Scientifically 
Meaningful?” 0 It has been stated that it requires fifty years 
for society to adopt a new idea or practice. Much of the knowl¬ 
edge of theory and practice now being utilized by those con¬ 
cerned with the study of group behavior was developed and 
established many years ago, Some of the psychologists and 
social scientists who are today most critical of the “group 
dynamics” movement assisted in the early discovery and 
definition of social phenomena which form the theoretical 
base upon which the group dynamics movement is established, 
and these critics continue to reiterate the same or similar 
concepts with respect to social action, with no awareness of 
their commonality with group dynamics. It is often quite a 
different matter to apply social science than it is to write 


4 Lewin, K. "Frontiers in Group Dynamics: Concept, Method and Reality in 
Social Science; Social Equilibria and Social Change.” Human Relations, I (1947), 5-41. 
‘Ibid., I (1947), 143-153. 

'Sellitz, Clair and Cook, Stuart, W., "Can Research in Social Science Be Both 
Socially Useful and Scientifically Meaningful?” American Sociological Review, XIII 
( r 94®)i 454-459- 



496 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

about it or to discover its laws in the protection of controlled 
laboratory conditions. In fact, it is always difficult to apply 
what we believe, to reduce the lag between our knowledge of 
social science and our application of that knowledge to problems 
of human relations and social advance. 

The third aspiration of those concerned with group dynamics 
is to enlist other social scientists from a variety of disciplines 
in the further study of group development, interpersonal 
relations within groups, relations between groups, and the 
basic laws of human relations. This requires an unusual type 
of intellectual maturity and material; it requires a complete 
integration of the basic theory and practice of many disciplines 
of social science. 

The present departmentalization of subject matter and 
professional training has handicapped this type of integrated 
study and practice. Furthermore, the study of group dynamics 
must be experience-centered and many of our institutions 
are not yet ready to utilize the community as a laboratory 
for advance study and skill development. Time and space 
will not even permit their cooperation in integrated study 
within the institution. Communities uninitiated to such 
working relationships with university or academic men must 
also be cultivated to utilize such resources. One may speculate, 
however, whether the community is more ready to have the 
academic man step outside his ivory tower than is the academic 
man ready to leave the protection of his isolation. This is, of 
course, a criticism every generation makes of the scholar and 
the researcher. Students and professionals in advanced studies 
in many fields are demonstrating the importance of an under¬ 
standing of group dynamics and the application of group 
procedures on their work. A few examples may be cited to 
document this assertion. Dr. Max R. Goodson of the College of 
Education, Ohio State University, has described the implica¬ 
tions of social engineering in public school administration. 7 

Dr. Edward C. Tolman of the University of California, 
recipient of the 1949 Kurt Lewin Award, in his Memorial 


7 Goodson, Max R. "Social Engineering in a School System.” Progressive Education. 
XXVI (1949), 197-201. 



GROUP DYNAMICS 


497 


Lecture 8 added to the theoretical structure of the basic 
concepts of group dynamics in social learning. Dr. Tolman 
stressed the nature of the influence of drives, beliefs, goals, 
perceptual readiness and perceptual blindness. His concepts, 
methods, and findings will more adequately illuminate the 
complex nature of student mores. 

Dr. Harold Fields, of the Board of Education of New York, 
recently reported in an unpublished manuscript the develop¬ 
ment of a group-interview technique used in the selection of 
teachers. This technique utilizes several of the common group- 
dynamics evaluation procedures such as systematic observa¬ 
tions of candidate behavior in situational tests involving six 
candidates. This method demonstrates how individuals behave 
in situations of reality and is a more realistic evaluation than 
paper-and-pencil test material. A similar procedure is now 
being considered for the selection of candidates for admission 
to one of New York’s medical colleges. Numerous professional 
and lay organizations are also utilizing group processes in their 
administrative procedures and in program development. 

It is difficult to discuss research and evaluation in group 
dynamics when the field, as such, is not yet adequately defined, 
or acceptable to many in the professions related to social 
science. The basic philosophy, concepts, values, and skills 
which constitute the common core of group dynamics theory 
and practice will make a fundamental contribution to the 
improvement of human relations and the advancement of 
social science because this core is rooted in established and 
fundamental concepts of the basic social sciences. 

The controversy over the nature and importance of group 
dynamics has been widespread. In fact, the cry of the an¬ 
tagonists is raised so loudly that one is tempted to echo Shakes¬ 
peare, “The lady doth protest too much, methinks.” The 
feelings of Dean Browne are shared by many critics who are 
concerned with: “the cult” of the proponents; “the uncritical 
acceptance” of group-dynamics discoveries; “the verbiage” 
in which they (the proponents of group dynamics) are im¬ 
bedded; “the pseudo-learned special dialect”; “the wild 

! Tolman, Edward C. “The Psychology of Social Learning.’’ Journal of Social 
Issues, Supplement No. 3, Dec. 194.9, 



498 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

oscillation, and breakdown which, results from the feed-back 
correction and over-correction”; and “the diffuse vagueness 
of the literature on group dynamics.” However, as Dean 
Browne urges, “It would be part of wisdom first to try to 
understand it,” and since this social innovation is established 
upon the theory and practice of a multi-disciplinary field it 
behooves the critic to be reasonably well acquainted with the 
theory and practice of these several disciplines before he 
criticizes the emerging ideas, concepts, practice and research 
of group dynamics. 

What of the nature and trends in group dynamics research 
and evaluation 1 

The importance of research in group dynamics—inter¬ 
personal relations, if you will, is no longer a rhetorical question. 
Human relations today produce problems of major significance 
in politics, industry, medicine, and community living. Educa¬ 
tion has its own personnel problems. 

It is necessary, however, to recognize certain technical 
limitations. Let us consider, for a moment, the requirements 
of research which the workers in group dynamics studies have 
found vexing. Dr. Richard Crutchfield, writing in the American 
Psychologist 9 for the joint Committee on Public Service 
Standards in Social Psychological Research, reported: 

In its phenomenal growth during the past fifteen years 
social psychology has exhibited certain faults common to any 
rapidly growing field of science. There has been an unevenness 
in the quality of the research carried on and an unevenness in 
the training and competence of research workers. Moreover, 
because its problems have an immediate bearing on practical 
problems of everyday life, the applications of social psychology 
have tended to outstrip basic research. Practical pressures 
will continue to favor the applied phases at the expense of 
basic theoretical research and methodological development 
upon which sound application must be founded. 

Dr, Crutchfield here describes one of the reasons why there is 
so much confusion about group dynamics in the minds of 
practitioners of personnel work and of the social science 
world in general. 


“In the The American Psychologist, IV (1949), p. 112. 



GROUP DYNAMICS 


499 


Dr. Donald G. Marquis, 10 in his Presidential Address to the 
American Psychological Association, in 1948, also stressed 
certain difficulties in achieving the objectives of social science 
in the study of group dynamics. Dr. Marquis indicated that 
early research in psychological frontiers suffers for a lack of 
theoretical structure to guide the inquiry, of an accepted and 
adequate terminology, of "standard measurement techniques 
for the relevant variables,” and that it suffers because, as is 
true of all workers in new fields, those engaged in group- 
dynamics research are often "dismayed at the absence of the 
simplest kinds of taxonomic data on the materials of their 
study.” Consequently, early research reports often appear 
to be inferior and quite unrelated. 

These difficulties, described by Crutchfield and Mafquis, 
prompt the kinds of criticisms cited by those who challenge the 
group-dynamics approach to an understanding of social 
phenomena, described earlier in the words of Dean Browne. 

The verbiage of every professional in-group tends to exclude 
others temporarily. The terminology of psychoanalysis and of 
atomic physics were disturbing a short time ago, yet today they 
contribute to the language of every family. It would seem 
necessary, however, for us, who are interested in the objectives 
of group-dynamics, to guard against any exclusion by associa¬ 
tion or communication if the positive and constructive con¬ 
tributions this vigorous field of endeavor has to make to 
knowledge and skills in interpersonal relations is to be realized. 

Those who are disturbed by the development of interest 
in group dynamics must likewise examine carefully these 
developmental manifestations, so that they may gain proper 
perspective in the examination and use of the knowledge 
that is constantly being developed. The immediate objectives 
of research in group dynamics are to develop: (a) a respectable 
theoretical structure to guide their inquiries; (b) an acceptable 
and adequate terminology; (c) standard measurement tech¬ 
niques for the relevant variables; and (d) the collection of 
adequate taxonomic data. These are difficult tasks which 
require time and the integration of several heretofore appar¬ 
ently unrelated disciplines and frames of reference. Those 


10 In the The American Psychologist, III (1948), p. 431. 



500 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


involved in the development and application of applied social 
science research findings concerning group dynamics either as 
critics or workers will have to proceed with vitality tempered 
with common sense and self-discipline. Recently a Dean of 
Students remarked: 

We are establishing a faculty-student advisory program. We 
do not know how to group the advisees. We do not know how 
to introduce the available faculty advisors into the groups so 
that the group will achieve maximum productivity with respect 
to their needs. How can one form groups and include a faculty 
advisor when there is such a wide variance in individual 
capacity, experience, expectation, and in basic personality 
structure? 

Of course, we can use the traditional unitary systems of 
grouping, such as intelligence or age or problem, but this is 
neither effective nor realistic. Heterogeneity is a conspicuous 
and common characteristic of group relations. It is always 
necessary to learn how to work with people who are different. 
It is naive to educate and to speculate or to rely upon the 
possibility of always being able to work with those who are 
of the same color, the same religion, the same values, and the 
same basic capacities. The problem of this particular personnel 
administrator is to help the faculty advisers and the students 
to learn how to work together with their differences. The 
problem of grouping is made difficult by a number of compo¬ 
nent problems. Dr. Morton Deutsch of the Research Center 
for Human Relations of the New School for Social Research, 
has described in detail the influence of competition and co¬ 
operation on group process and development. 11 Another often 
neglected aspect of group experiences is the psychology of 
learning. In this group guidance situation, learning is an 
important factor. Dr. Herbert Thelen, Associate Professor of 
Educational Psychology, University of Chicago, has made a 
worthy contribution to understanding this aspect of group 
process in a recent review of "Group Dynamics in Instruction: 
Principle of Least Group Size." 12 As the title indicates, this 

11 Deutsch, Morton, "An Experimental Study of the Effects of Cooperation and 
Competition Upon Group Process. 1 ' Human Relations. II (lyj.9), 199-131. 

13 Thelen, Herbert A., “Group Dynamics in Instruction: Principle of Least Group 
Size." The School Review, XVII (1949), J39-148. 



GROUP DYNAMICS 


5<H 

study also treats with the importance of a desirable number of 
participants in a working group. Learning to work together is a 
prerequisite of group-problem solving which no idyllic platitude 
of brotherly love will satisfy. The hazy generalizations with 
which we often diagnose and prescribe for many student 
personnel problems are rarely specific enough to resolve the 
knotty problems of interpersonal relations. 

A director of student activities who has responsibility for 
student housing states, 

We have practically no social communication between the 
dorm students and the fraternity groups. Our campus has 
achieved no group standards which are commonly accepted, 
and one result of this lack of communication and lack of 
standards is a lack of morale and cohesion in the larger and 
common educational and developmental program. Our rules 
and regulations won’t work. 

In fact, without careful and respectable research there is no 
easy or fruitful answer to this problem. Personnel administrators 
will do well to consider seriously the types of research studies 
now being carried out in the area of group dynamics and 
interpersonal relations. 

These situational problems indicate the need to understand 
the basic dynamics of the interpersonal relations but they also 
emphasize the need for a specific type of research and applied 
skill training in the professional training of student personnel 
administrators. Education and training in applied social 
psychology, group dynamics, and action-research skills, neces¬ 
sary for effective change, are too often luxury items or after¬ 
thoughts in the formulation of a program of professional 
education. The importance of research skills has often been 
minimized in the graduate preparation of our personnel 
administrators in higher education. If college personnel 
administration is to continue to be a respectable and sturdy 
profession more attention must be given to the development 
of student personnel policies and procedures substantiated by 
appropriate and reliable research. 

“After a year and a half of study a committee of five members 
of Princeton University faculty released today a 7,000 word 
report on the state of undergraduate faculty relations at 



£00. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Princeton.” Reported in the Sunday New Tork 'times on 
March ig, 1950, this article described “ tensions on the campus,” 
the need for "students and faculty to know one another 
better,” the need for “undergraduates to forget ‘the fear of 
apple-polishing’ and to take the initiative with faculty members 
to recognize the obligations of a kind of campus citizenship 
not unlike their civic obligations in the community.” It is 
necessary for the college personnel administrator to have at 
his command the methods and the skills of research which will 
enable him to isolate the basic problems, review the knowledge 
of such problems in other spheres of influence and interpersonal 
relations, initiate sound and revealing procedures for pre¬ 
liminary observations, construct a reasonable theory of 
causation, and verify such assumptions through the application 
of procedures which promise some reasonable assurance of 
reducing the tension. 

The Phelps-Stokes Fund recently released a report 13 of the 
needs of some 400 foreign students from Africa. These students 
came here inspired with a desire “to aid their homelands 
toward independent status or simply to better the lot of their 
fellow-countrymen.” Patterns of segregation and discrimination 
in American colleges and universities embittered these hard¬ 
working, self-sacrificing students who came to American 
colleges because they offered a wider range of courses and 
experiences. This is a personnel problem of our native born 
American Negroes, Jews, Catholics and other cultural groups, 
To ignore or give lip service to theoretical democracy without 
doing something practical about the problem will contribute 
further to our national and international discord. This is a great 
opportunity, and much is being done to analyze the problem 
and to meet it in a concrete manner, but it is with just such 
problems that those involved in group dynamics studies are 
concerned. 

Is it not an appropriate moment for college personnel 
administrators to seek financial support for a series of research 
studies of the most pressing college personnel problems which 

11 This report was made public by the offices of the Phelps-Stokes Fund, ioi Park 
Avenue, and was prepared under the leadership of Dr. Ruth C. Sloan of the State 
Department and Ivor G. Cummings of the Colonial Office. Dr. Channing Tobias 
reported that the project and report were strictly private and not official in character. 



GROUP DYNAMICS 


5°3 

are essentially problems of interpersonal relations and group 
behavior? Requirements of such research work demand a 
team of specialists in group relations, social psychology, 
personnel administration, mental hygiene consultants and 
others oriented to research procedures and also to practical 
problems in student personnel administration. It will be 
necessary to delimit many of the problem areas and to recognize 
that many of these pressing problems are essentially those 
which have to do with group relations or with the potentialities 
of the group as an appropriate medium for the satisfaction of 
certain student personnel needs. 

Psychological research within our time has accomplished 
respectable achievements in comparative psychology, physi¬ 
ological psychology, in the psychology of learning, of mental 
abilities, and in social psychology involving political science, 
sociology, anthropology, and economics. The frontiers of 
research in human relations and of group dynamics are beyond 
the daily practice of most of us. 

Kurt Lewin, a well-spring at the frontier of research in 
interpersonal relations, described in his last writing the cutting 
edges of research in group dynamics. Those who have carried 
on this work have been striving to reduce the unknown. Lewin 
was certain that “the scientific aspects of this development 
(e.g., group dynamics research) center around three objectives: 

(1) integrating social sciences; (2.) moving from the description 
of social bodies to dynamic problems of changing group life; 
and (3) developing new instruments and techniques of social 
research.” 14 

The student-personnel administrator can recognize that the 
student and faculty society with which he works provides a 
challenge for such study. Certainly the dean, adviser to 
students or director of student activities, as well as the psycho¬ 
logical counselor, need at all times to remember the functional 
importance of: (1) the social environment and the sociology 
of the community in which the individual lives and works; 

(2) the individual differences in behavior and especially in 
responding to the environment; and (3) the cultural mores 
and standards which influence the needs of the individual and 

14 Lewin, Kurt. Human Relations , I (1947), 5. 



504 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

the dynamics of the social units of the collegiate society of 
which the face-to-face group is a basic structure. We can 
utilize the basic knowledge of sociology, individual clinical 
psychology and cultural anthropology, but we must learn to 
apply this knowledge in situations of reality of which we and 
our students are a part. The nature of individual behavior 
and of the behavior of campus groups as groups is constantly 
changing. No complacent, static concept of the nature and 
function of these social units of the college will suffice. The 
personnel administrator must develop more critical procedures 
for the examination of the social phenomena of student and 
faculty life. The approach to such understanding utilized 
in the phenomenological approach to social and group ex¬ 
periences is indeed promising and challenging. Dr. Robert 
B. MacLeod , 15 Head of the Department of Psychology, Cornell 
University, has described certain professional needs which 
many personnel administrators consider of grave importance 
to the development of our professional skills. The first is the 
need for a systematic procedure of observing and describing 
the characteristics of experiences of people in groups. The 
second is the need to suspend many, if not all, of our naive 
assumptions as to the underlying mechanisms which prompt 
the behavior of people in groups. Third is the need to develop a 
set of principles by which it will be possible to determine 
what is happening in our college social and group life, how it 
occurs, when and where. Ultimately we may know why. 

IS Madeod, Robert B., "A Phenomenological Approach to Social Psychology.’® 
Journal c] Psychological Reviews, LIV (1947), 193. 



the creation of an effective faculty 

ADVISER TRAINING PROGRAM THROUGH 
GROUP PROCEDURES 

IRA J. GORDON 

Associate Professor and Counselor, Counseling Bureau, Kansas State College, 
Manhattan, Kansas 

You may recall that last year, at the convention, Dean 
Maurice Woolf of Kansas State left us with the statement: 
"A blissful unawareness of the impossible is all you need,” 
in order to venture forth on a faculty advising program. He 
further laid down for us some basic concepts underlying an 
approach to faculty cooperation. At that time I was a faculty 
member at Kansas State attending the sessions as an on-looker. 
This summer, after joining the Counseling Bureau, the author 
felt that there was a chance to put these concepts into effect 
to a degree beyond which they had been tried. So to speak, 
we decided to demonstrate the efficacy of the concepts, and 
our beliefs in the value of faculty participation in advising. 
Thanks to Dean Woolfs groundwork, we had a faculty advisory 
program, and a faculty group of 250 who were involved in it. 
The problem that presented itself was the utilization of these 
advisers so that they could function effectively at their work 
with freshmen. The nature of the situation was such that these 
people, to a great degree untrained, had to be exposed to a 
training program of a dynamic nature over a relatively short 
period of time—the Fall Semester. 

Faculty advisers were spread out over virtually one-third 
of the staff in all of the various schools. These staff members 
were responsible each for a small number of freshmen, usually 
ranging from six to ten. These faculty members were ill-trained, 
and many of them were new or had had no training. This 
difficulty was created by an old-line feeling on “the hill” that 
advising required no skill; that any intelligent professor can 
give “good advice” to students. There was also the feeling 

5°5 



506 educational and psychological measurement 

that college students should be mature, and should not require 
help. Some felt that students have no problems other than 
vocational choice. 

The Counseling Bureau did not exert administrative control 
over the advisers. They were under the control and pay of the 
academic deans and their names were furnished to the Bureau. 
Therefore, there was no direct line of authority between the 
Bureau and the advisers, The former functioned as liaison, 
and as the data-supplying agency. The Bureau had, before 
September, 1949, attempted to provide some minimum of 
training, mostly through from one to four short lectures 
covering skills in test interpretation, concept of the counselor’s 
role, specific information, etc. (This was rather limited in scope.) 

The advisers were on the job. They had been given the 
cumulative folders, they were seeing students, and many of 
them felt that they could not make use of the information 
the Bureau was furnishing them. There was a strong need for 
holding the cooperation of the faculty gained over the last 
few years, and a strong need to move the program forward on 
more solid ground. With this in mind, the author, with the 
support of the Dean of Students and the Counseling Bureau 
staff, decided to institute a volunteer training program for the 
faculty, using small group procedures as the method of 
instruction. 

Our major philosophy governing the operation of these 
groups was that of democratic group procedures. We desired 
to keep the situation free and permissive so that the groups 
would feel free to move along the lines they wished, and at 
the rate they wished. We desired that participation remain on a 
voluntary basis, so that those who wished to join could really 
feel in harmony with the program. We wanted the situation 
to be one in which negative and hostile feelings, personal 
feelings, could be expressed. We intended to place responsibility 
for learning in these groups where we felt it belonged—on the 
advisers. The training program, therefore, was built around 
group discussion, role playing and live experiences. 

We believed, and our experience has substantiated our faith, 
that these groups, on their own initiative, would cover the 
areas that they considered the most significant to them, and 



FACULTY ADVISER TRAINING PROGRAM 


507 


that there would not be much discrepancy between what the 
professional counselors considered important, and what the 
advisers considered important. On this belief, we did not 
intrude our ideas on what should be covered, or attempt to 
indoctrinate the participants with any one counseling point of 
view. We felt our role to be that of supplying resources when 
asked, giving aid on how to discuss what they had decided was 
pertinent, and, after the collections of people had become 
groups, to participate as members in the full sense of the word. 

This idea was presented to the faculty advisers at a meeting 
on September 7, 1949. The basic philosophy behind the program 
was included in the presentation, as well as a partial list of the 
possible areas to be covered. Ninety-seven advisers, including 
many department heads, and all save one of the assistant 
Deans of the various schools on the campus volunteered to 
participate. A breakdown of the figures reveals the following 
information: 65 per cent of the advisers in the School of Home 
Economics attended sessions, 41 per cent of the advisers in the 
School of Engineering, 34 per cent of those in Agriculture and 
24 per cent of those in Arts and Sciences participated in the 
first semester. 

They were divided into training groups on the basis of the 
amount of time they had free, the length of time they wished to 
participate, the hours of the week available, and the depart¬ 
ments in which they taught. An attempt was made to make 
the membership of each group a heterogeneous one, so that 
there could be a free exchange of ideas and information among 
the people with varied backgrounds and training. 

Three groups, consisting of a total of thirty members, met 
for one session a week. Three, consisting of thirty-five members, 
met for one session every other week. The author was the 
resource person for these six groups. Two groups, totaling 
eighteen members, met once a week for five meetings before the 
advising period, and one meeting during the period. They 
worked with Professor Paul Torrence, Bureau Director. 
These people, because of time pressure, decided to meet for 
this short length of time, Two groups, of thirteen members, 
met once a month with Miss Dorothy Mitchell of the Bureau. 
They held a few extra meetings. 



508 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


As in all situations, there were several limiting factors. It 
was not possible to arrange for any financial or released-time 
incentives for participants in the training program. Indeed, 
all faculty advising is “extra” work without financial compen¬ 
sation. The utilization of time proved to be a difficulty. Meeting 
times had to be arranged to suit most participants, and some 
who wished to join were unable to do so. 

Since the College serves the entire State of Kansas, many 
had to miss meetings because of extension or other obligations. 
Faculty members are often used by state and local agencies in 
consulting, judging, and other roles. Many attempted to 
attend other group meetings to make up, but there was some 
loss of continuity and group unity because of this. 

There were some powerful positive factors in operation that 
more than counterbalanced the above limitations. The faculty 
advisers felt that such a program was needed, many felt that 
they were inadequate. There was a feeling, more covert than 
overt, that such a program could contribute to personal 
growth as a teacher and as an individual. 

The students, through their planning conferences, had 
made recommendations that faculty counseling was essential, 
and that advisers should be trained. The deans of the respective 
schools were interested in the creation of the program. All 
concerned displayed a strong spirit of cooperation. 

One difficulty that presented itself after the program had 
started, and one that we had anticipated, was the normal 
one that arises when any group of people, competent in their 
own fields, are called upon to undertake new learnings and to 
use new procedures quite removed from their own. For example, 
many of the advisers have come from the physical-science and 
technological areas where they have long been trained in 
individual research, and where they have conducted classes 
on a lecture as well as laboratory basis. Group thinking and 
group processes were an essentially trying experience for many 
of them at the beginning. The author feels, however, that by 
the use of process observers, and by the resource person from 
the Bureau expressing these “trying” and negative feelings 
when they arose, that this difficulty was mostly overcome. 

What did the groups discuss and do? The following is a list 



FACULTY ADVISER TRAINING PROGRAM 509 

of topics, discussed by at least two-third of the groups, and 
selected out of the protocols: 

1. Test Interpretation 

(a) Meanings of tests, test scores 

(b) How to apply the information, interview techniques 

2. Philosophy of Education 

(a) Who should go to college? 

(b) Responsibility of College toward student 

(c) General education 

(d) Curriculum construction 

(e) Entrance requirements 

3. The Problem of the Marginal Student—low aptitude, low 
ability, high level of aspiration 

4. The Role of the Faculty Adviser 

(a) Responsibility to institution, to student, to self 

(b) Relationships with students 

5. The Problem of increasing student contacts 

(a) Use of social gatherings, called by adviser at home 

(b) Use of upper-class students as group leaders of 
Freshmen groups (Home Economics School) 

(c) Other schemes 

(d) Where does responsibility for initiating contact lie? 

6. Teaching Methods 

(a) How do you teach students to accept responsibility, 
think critically? 

(b) How do you create student interest? 

(c) Grading and testing 

(d) Group Procedures 

(e) Student rating of the faculty 

7. The Dynamics of (Student) Behavior 

(a) Discussion of specific cases 

(b) Role-playing-dramatized interviews 

This represents only those areas covered by most of the 
groups. There were some groups which covered other topics, 
including the mental hygiene needs of instructors. On the whole, 
an analysis of the protocols and process charts shows a great 
deal of involvement in the program, many new ideas advanced, 
and much interchange of information among the members 
from the different schools. 

No program would be complete without attempts to evaluate. 
This process is still going on. Our first thoughts on evaluation 
included a pre-test and post-test battery, to measure several 
aspects of the program. The author created, and administered 
to the groups, a pre-test inventory consisting of three parts. 



510 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

There was no opposition on the part of the faculty after the 
purpose of the battery was explained, and adequate safeguards 
taken to insure comparative anonymity. The first part was 
designed to measure the individual faculty members concept 
of what his role is in counseling, and his attitudes toward 
students. This part was a sentence-completion exercise of 
twenty-five items. We are still evaluating the returns on this, 
and the entire Bureau staff is rating the answers in an attempt 
to cut down on the limitations of such a projective device. 

The second part was an information exercise, and the third, a 
miniature case study. The questions raised in the latter were 
revised from Strang’s list in Counseling 'Technics in College and 
Secondary Schools , and were designed to be useful in training 
as well as evaluation. This was included on the theory that 
a utilization of knowledge in an organized, integrated fashion 
is necessary for effective counseling of the type done by 
advisers. 

Only the Sentence Completion Test was re-administered at 
the end of the semester. It was felt that the other measures 
were too static and not valid in this type of program. The 
third part was used for discussion and as role-playing material 
in some of the groups. 

In addition to this test procedure, reports of the content and 
process of the meetings have been kept, with a member of the 
group acting as process observer, using mimeographed material 
as a guide, and keeping a participation chart, while the resource 
person acted as recorder. Readings of these protocols show 
movement and positive changes and will be used to show 
growth in concept and understanding. Some recordings of 
role playing and discussion were made, and these will be 
used,too. 

At the end of the semester, the author created an evaluation 
questionnaire that was sent to all the participants. The analysis 
of returns is still in process, but the evidence tends to show that: 

i. We have a firm base on which to build additional training 
programs at Kansas State and other comparable institutions. 

1 . The program has had repercussion in the classroom teaching 
of the participants. 

(a) Group dynamics procedures have been adapted for 
classroom use and experimentation in classes such 



FACULTY ADVISER TRAINING PROGRAM 


511 

as senior mechanical engineering laboratory, fresh¬ 
man classes in personal health, classes in journalism, 
education, foods and nutrition, industrial manage¬ 
ment, and others. 

(b) A concern for and understanding of the behavior of 
students has modified grading and other teaching 
procedures. 

3. Relationships with the Counseling Bureau, and use of its 
facilities by faculty have increased. 

4. Advisers feel more adequate in their handling of test 
data, and have made use of the learning in interviews with 
students. 

5. The advisers feel that they now recognize that more res¬ 
ponsibility must rest with the student, both in counseling 
and in class work. 

The evaluation by the faculty also shows that they gained 
much from the heterogeneous make-up of the groups, from the 
method, and from the total approach. Not all was sweetness 
and light, however. Of course, some faculty members, because 
of their own personality, or because of their long years of 
training, felt that such group procedures did not meet their 
needs. Some felt they had come for the facts, and that they 
did not get them presented: one, two, three. One wrote on his 
evaluation sheet: “I went into the program in order to have 
some expert counselors give me some information on counseling. 

. . . I was not interested in serving as a guinea pig for an 
experiment in group psychology. If you ever decide to give the 
advisers some pointers on counseling they can use, I shall be 
glad to participate .” This faculty person attended only one 
session and withdrew. He represents an extreme minority. 

Although the evaluation process is incomplete at this time, 
the Bureau members feel that the program has been a success. 
One further indication we have is that five groups are going 
strong in this second semester. We had decided to terminate 
the program before Christmas, but none of the groups decided 
to do so, These present groups are all meeting once a week, 
because we found that to be the best arrangement in our 
situation. We found that the groups which met each week far 
outdistanced the others in terms of group unity, content 
covered and all-round participation and satisfaction. 

We in the Bureau know that we have learned much from 
the advisers, and have gathered many excellent suggestions 



512 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


from them. We know that our knowledge of group procedures 
has grown greatly from the experience. We have learned much 
about our role in such groups, and about the faculty expecta¬ 
tions of such a role. We are now using that learning in the 
spring groups. Perhaps it would be more exact to call this a 
"cooperative learning program” rather than a faculty training 
program. 

We feel that the use of the knowledge of small group dy¬ 
namics in creating and operating a large-scale training program 
for advisers is practical and successful, and that it can be 
applied effectively in other institutions. We believe such a 
program rests upon the extension of the application of personnel 
techniques by the counselor to the faculty. If the counselor 
respects his faculty colleagues, works with them in a democratic 
fashion, and attempts to meet their needs, he can secure faculty 
cooperation and participation in advising and training. 



A GENETIC STUDY OF SOCIALITY PATTERNS OF 
COLLEGE WOMEN 


DAVID S, BRODY 
, Montana State University 

Introduction 

The present research represents an exploratory study of 
some of the underlying factors determining sociality patterns 
of 140 freshman college women living together in a dormitory 
residence at Montana State University during the academic 
year 1948-49. Sociometric data employed at the residence 
halls in reassigning roommates after a period of six months 
were utilized as a criterion of sociality. Each girl was asked to 
list the names of all girls in the dormitory she would like to 
have as roommates as well as the names of girls she did not 
desire as roommates. Since the girls knew that the data would 
be actually used for room assignments, maximum cooperation 
was secured. During the first week of the Spring Quarter, 
after the girls had moved to their new rooms, they were asked 
to rate the other girls in the dormitory on three traits: leader¬ 
ship, social qualities, and work habits. In addition, each girl 
filled out an inventory indicating the extent of participation in 
various home duties and in individual and group activities 
prior to her entry in college. She also filled out a Questionnaire 
pertaining to the parents’ attitudes and their supervision of 
activities prior to her entry into college. Data on the Minnesota 
Multiphasic Personality Inventory , which was administered 
at the beginning of the academic year, were also utilized in 
the study. 

Additional data on participation in individual and group 
activities in college and a measure of student satisfaction with 
college life were secured toward the end of Spring Quarter. 
However, these data have not as yet been analyzed. 

513 



5I4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Results on Item Analyses 

The initial step in the analysis of data consisted of tabulating 
the number of times each girl was accepted as a roommate. 
The number of acceptances for each girl ranged from 1 to 27 
with a mean of 7.6 choices. 

On the basis of this tabulation, two groups of 30 girls, each 
representing approximately the lowest and highest 2.0 per cent 
in the distribution, were isolated. 

For purposes of discussion these groups will be referred to as 
the “Unaccepted” and “Accepted” groups. Each girl in the 
unaccepted group received four or less choices as a roommate 
and each girl in the accepted group received ten or more 
choices. 

Employing these two groups, an analysis was made of each 
of the items on the Inventory and Questionnaire. It was 
found that a number of items did differentiate between the 
two groups at the 5 per cent level of probability or better. 
Altogether, a total of 241 items were employed in this explora¬ 
tory study and approximately 20 per cent were found to be 
significant. 

Items included in the Inventory indicating the extent of 
participation in various activities prior to entry in college were 
classified under three headings: 

Part I. Participation in individual and informal group ac¬ 
tivities. 

Part II. Participation in home duties. 

Part III. Participation in formally organized group activities. 

Each item in Part I and Part II was checked by the student 
in terms of frequency. (For purposes of item analysis, three 
categories were employed—namely, none or little, some, and 
much or very much) In Part III pertaining to formally organized 
group activites, four categories were employed: 

(a) no participation 

(b) member in name only 

(c) participating member 

(d) officer or committee chairman 

Analyses were made separately for each category within 
an item. 



SOCIALITY PATTERNS OF COLLEGE WOMEN ^I£ 

In part I, the following seven items showed significantly 
greater participation on the part of the accepted group: 

(a) attending movies 

(b) swimming 

(c) going out on dates 

(d) touchball 

(e) hiking 

m social dancing 

(g) visiting friends 

Significantly greater participation on the part of the un¬ 
accepted group was shown by the following two items: 

(a) playing checkers or chess 

(b) reading 

In Part II, all of the significant items showed greater partici¬ 
pation on the part of the accepted group. These items are: 

(a) selected new clothes for myself 

(b) laundered 

(c) made my own bed and straightened out my room 

(d) painted (furniture, walls, etc.) 

(e) canned fruits and vegetables 

(f) cleaned house 

(g) washed and wiped dishes 

(h) chores around barns 

(i) worked in fields (ploughing, sowing and harvesting) 

Similarly in Part III, the items yielding significant differ¬ 
entiation showed more participation for the accepted group. 
These items are: 

(a) student government 

(b) high school fraternity or sorority 

(c) school athletic team 

It should be emphasized that a number of other items showed 
consistent differentiation between the unaccepted and accepted 
groups for each of the categories, but fell somewhat short of 
meeting the 5 per cent level of significance. (Data for these 
items will be presented in a subsequent paper.) There would 
appear to be an important difference in the extent of home 
responsibilities between the two groups of girls. In general, 
girls in the accepted group reported that they fulfilled home 



516 educational and psychological measurement 


responsibilities to a much greater extent than girls in the 
inaccepted group. 

The Questionnaire on Family Background included items 
designed to indicate parents’ attitudes and their supervision of 
activities prior to entry in college. The first section of this 
Questionnaire consisted of 42. items, twenty-one of which 
pertained to the father’s attitudes and the remaining twenty- 
one to the comparable attitudes of the mother. 

Of this group of items, a significantly greater proportion of 
the accepted group indicated that: 

(a) My father provided me with a regular allowance 

(b) While attending high school, my father expected me to 
participate in social activities 

Whereas, more of the unaccepted group indicated that: 

(a) My mother expected me to work for pay outside the home 

(b) My mother tried to push me ahead and to make me excel 

(c) My mother emphasized the importance of good manners 

(d) My father selected clothes and other personal articles 
for me so I wouldn’t make mistakes 

Included in this section were 13 additional items measuring 
parent-child and sibling rapport. These items were adapted 
in part from Terman’s study 1 on the prediction of marital 
happiness. Girls in the unaccepted group indicated a sig¬ 
nificantly greater degree of conflict both with their fathers 
and with their mothers than did girls in the accepted group. 
They likewise showed a greater amount of conflict with their 
brothers and sisters. 

Another series of items on family background was designed 
to indicate the type of control exercised by the parents relative 
to i\ different areas of activities. The students were asked to 
check the type of control exercised by the mother and father 
separately. 

The items appear to indicate less stringent control for girls 
in the accepted group. However, before generalizations can 
be drawn from the data, analyses in terms of weighted scores 

1 Terman, Lewis M. and Others. Psychological Factors in Marital Happiness t 
New York, McGraw-Hill Book Company, 1938. 



SOCIALITY PATTERNS OF COLLEGE WOMEN 517 

are indicated. This step has not yet been taken. Preliminary 
analyses indicate that the significant areas are: 

(a) choice of friends of the opposite sex 

(b) going out on dates 

(c) time of coming home from dates or parties 

(d) studying school lessons 

(e) cleaning my room and taking care of personal possessions 

The last section of items pertaining to family background 
dealt with the extent of agreement between the student and 
her father, and between the student and her mother, on the 
type of supervision of the same 21 activities listed under 
parental control. Girls in the accepted group showed generally 
greater agreement with their parents than did those in the 
unaccepted group. 

In summarizing the data on item analysis, it is significant 
to note that certain items appear to yield consistent differ¬ 
entiation between the unaccepted and accepted groups regard¬ 
less of the context in which they are found. For example, 
items concerning home duties, especially those involving 
personal responsibilities, differentiated between the accepted 
and unaccepted groups in the activity inventories, in the 
Questionnaires on the parents attitudes and their supervision 
of activities, and in the Questionnaire designed to measure 
extent of agreement between child and parent. Likewise, 
items concerning association with the opposite sex differ¬ 
entiated between the two groups of girls on the activity 
inventories and on the Questionnaires. 

Results on Ratings of Leadership, Work Habits, and Social 

Qualities 

As was indicated earlier in the paper, each girl was asked 
to rate all the other girls in the dormitory on leadership, 
work habits, and social qualities. The students were instructed 
to place a check mark in front of a girl’s name and go on to the 
next name if they had had no opportunity to observe that 
girl or did not know her well enough to rate her. 

Each rating was weighted from 1 to 5, 1 representing the 
lowest rating, and 5 the highest. Ratings for each girl were 



51 8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

tabulated and the mean of all the ratings she received was 
computed. The mean rating represented her score on the 
particular trait. For each of the three traits the distributions of 
mean ratings were symmetrically distributed and approximated 
normality. The girls were rated most frequently on social 
qualities and least frequently on work habits. The number of 
girls who were not sufficiently well known to be rated on social 
qualities was 24 or 17 per cent, on leadership 31 or 22 per cent, 
and on work habits 59 or 42 per cent. Thus, the girls as a 
group felt that they were least able to evaluate others on the 
basis of work habits and most able to evaluate others on the 
basis of social qualities. 

The number of acceptances each girl received was correlated 
with the mean rating on each of the three traits. The highest 
correlation with acceptance scores was obtained for social 
qualities. This correlation was .59. The correlation with 
leadership was .52 and with work habits, .20. Ratings for 
social qualities and leadership are certainly significantly 
related to acceptance scores, but it is obvious that social 
qualities and traits of leadership are by no means the only 
factors determining acceptability. The possession of good 
work habits is apparently of minor importance in the selection 
of roommates. 

When the correlations were computed between work habits 
and ratings on leadership and on social qualities, they were 
found to be .49 and .32 respectively. Thus, we find that leader¬ 
ship correlates almost as highly with work habits as with 
acceptability. 

Although there is a significant relationship between work 
habits and social qualities, it is considerably lower than 
between work habits and leadership. However, leadership and 
s'ocial qualities are highly related to each other as evidenced 
by a correlation of .85. It can be hypothesized that social 
qualities are an important determinant in the selection of 
leaders among this population, but that work habits constitute 
another variable which is significant. 

Results on Minnesota Multiphasic Personality Inventory 

Since all members of the freshman class were given the 
MM PI at the beginning of the Fall Quarter, 1948, as part of the 



SOCIALITY PATTERNS OF COLLEGE WOMEN 


5*9 


Orientation Week Testing Program, it was decided to utilize 
these records to determine whether the accepted and un¬ 
accepted groups could be differentiated in terms of this 
personality inventory. 

In the present study, 0,8 MMPI records were available on 
the unaccepted group and 30 on the accepted group. On the 
keys pertaining to the specific personality variables, it was 
found that the mean scores of the unaccepted group were 
consistently higher than those for the accepted group, with the 
exception of the mean score on the hypomania scale. When 
the ‘t’ values were calculated, only one was found to be sig¬ 
nificant at the 5 per cent level of probability or better; this 
was the scale pertaining to schizophrenia. Since the schizo¬ 
phrenia scale is theoretically measuring withdrawal tendencies, 
it is the one on which we might have expected to attain the 
maximum differentiation. 

When we compare the standard deviations, we find that the 
unaccepted group is consistently more variable than the 
accepted group on all the scales except depression. When these 
differences in variability were tested for significance by the 
calculation of the F ratio, the psychopathic deviate, the 
schizophrenia and hypomania scales showed significant 
differences in variability at the 5 per cent level or better. 

The consistency of these differences in means and standard 
deviations suggests that the members of the unaccepted 
group are more likely to have personality disturbances than the 
members of the accepted group. The greater variability of 
scores for the unaccepted group is a reflection of the larger 
number of deviant scores in the direction of abnormality. 

Differences in variability on the L, F, and K scales were all 
significant at the 1 per cent level or better; the unaccepted 
group being more variable on the F & K scales and the accepted 
group more variable on the L scale. Differences in mean 
scores on these three scales were not significant. 

Summary and Conclusions 

The data which have been presented in this paper suggest 
that the student’s experiences within her family group and the 
pattern of her activities prior to entry in college are important 
determinants of her social acceptability. For example, girls in 



520 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

the accepted group indicated significantly greater participation 
in home duties, especially those involving personal responsi¬ 
bilities. In a comparison of activity participation outside the 
home, girls in the accepted group showed more participation in 
social activities whereas girls in the unaccepted group showed 
more frequent participation in relatively solitary activities. 
The evidence also points to the fact that the parents of the 
unaccepted group tended to be overprotective and to discourage 
the development of independence. The girls in the accepted 
group felt that their parents encouraged social development 
to a greater degree than was true of the girls in the unaccepted 
group. 

Apparently girls in the accepted group came from homes 
in which there was less conflict and greater harmony than 
was true of the homes of girls in the unaccepted group. The 
results on the MMPI suggest that girls in the unaccepted 
group evidence a greater tendency toward abnormality. 

Reilly and Robinson 2 in their study of popularity among 
college women point out the importance to counselors of 
obtaining some index of the probable social acceptance of an 
entering freshman. Their report shows that the usual college 
entrance data are relatively ineffective for predicting social 
acceptability. Of interest is their recommendation that aca¬ 
demic census data need to be supplemented by more vital 
statistics from the adolescent world. Certainly, the present 
study points to the value of this approach. For the personnel 
worker it means that if he is to understand the dynamic 
factors underlying social behavior at the college level, he 
must orient his thinking in terms of the developmental history 
of the individual. 

“Reilly, J. W. and Robinson, F, P. "Studies of Popularity in College: I—Can 
Popularity of Freshmen be predicted ?' 1 Educational and Psychological Measure¬ 
ment, VII (1947), 67-72. 



HOW TO GO ABOUT THE PROCESS OF EVALUATING 
STUDENT PERSONNEL WORK 


WILLIAM M. GILBERT 

Director, Student Counseling Bureau, University of Illinois 

The title of this paper is somewhat misleading and needs 
to be clarified. The title implies that there is a specific process, 
that student personnel services can be neatly defined, that there 
are perfectly valid criteria for determining the effectiveness and 
efficiency of these services and, finally, that the necessary for¬ 
mula for going about the process of evaluation can and will be 
supplied in cook-book fashion. Unfortunately, not one of these 
implications is justified. There is no one most-desirable way of 
going about the evaluation process. Student personnel services 
cannot be defined at all neatly; there are no criteria known to 
be perfectly valid and I have no secret formulas. 

With these few positive statements I should possibly end this 
paper dramatically and sit down. However, student personnel 
services will not continue to be accepted on faith indefinitely. 
Eventually some discerning college or university administra¬ 
tor will rightly ask: Just what are the purposes of student per¬ 
sonnel service and what is the evidence that these goals are 
being attained, or how can we get this evidence? We could not 
avoid the issue even if we wanted to. 

Mr. Blaesser, in his statesmanlike and visionary address of 
last year, and after reviewing the various attempts at over-all 
evaluation of student personnel programs, faced the issue 
squarely, He emphasized: “This means a total institutional 
study of the needs of the students coming to the institution, 
and the evaluation of the outcomes of the total educational 
experiences at the institution.” 

It was rightly explained by Mr. Blaesser that it would be a 
“Shangri-la” university where such far-reaching, highly coop¬ 
erative and expensive over-all and long-range evaluation of 
higher education could be carried on. 



(J22. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

In the meantime, before such a university evolves, the col¬ 
lege or university administrator is still going to have to allot 
funds to the various student personnel agencies and he is still 
going to want to know what evidence there is that the different 
objectives are being reached. He will probably not insist on 
perfectly valid evidence because he will be one of the first to 
recognize that perfection can be aimed at but cannot be ex¬ 
pected in broad educational endeavors. And we, as personnel 
workers who are sincerely interested in the welfare of students, 
will want to know how effectively and how efficiently we are 
serving their welfare. 

It is not possible or polite for me to make judgmental state¬ 
ments about the different colleges and universities you repre¬ 
sent. However, I am sure it will not be held against me by 
President Stoddard and Provost Griffith if I make the simple 
observation that while the University of Illinois is one of the 
great universities, it is certainly not a “Shangri-la” university. 
We have our problems too, as most of the rest of you do. There 
appear to be good spots and not-so-good spots in our over-all 
student personnel program. Most of you probably have what 
appear to be good spots and not-so-good spots in your programs 
too. The desirability of some type of general evaluation is prob¬ 
ably quite clear. One of the problems is how one should go 
about this process. Perhaps, by discussing some of the proce¬ 
dures which have been used at Illinois and some of the plans 
and hopes we have, ideas for developing evaluation procedures 
which will fit your own local situation may occur to you. Con¬ 
versely, any suggestions you have for us will be deeply wel¬ 
comed. 

One of the first problems to be faced both chronologically 
and in terms of importance is that of securing general, grass¬ 
roots acceptance of any type of evaluative procedure. 

In most colleges and universities, evaluation immediately 
poses a number of serious problems which must be faced. When 
we evaluate counseling services, when we evaluate registration 
procedures, when we evaluate health services, and when we 
evaluate instructional services, we are evaluating not simply 
services, but, perhaps even more importantly, we are evaluat¬ 
ing the persons who are responsible for such services and the 



EVALUATING STUDENT PERSONNEL WORK 


5*3 


persons who perform the services. As personnel workers we 
should probably be the first to recognize that many of our 
student personnel are not as good as they should be and that 
any evaluation of them immediately can serve as a threat to 
the individuals concerned. 

Several years ago the Student Counseling Bureau at the 
University of Illinois conducted a questionnaire study of stu¬ 
dent attitudes regarding the effectiveness of the counseling 
services they received. This Questionnaire, which went out to 
some 3000 students, was devised by members of the full-time 
psychological staff with full consideration given to suggestions 
made by the trained part-time faculty counselors who are a 
part of the Bureau staff. Nevertheless, faint rumblings of con¬ 
cern came to my ears, and since the purpose of the investiga¬ 
tion was not that of evaluating individual counselors, but the 
program as a whole, the counselors were reassured that there 
would be no individual breakdown of the findings. Nor was 
there. 

Just a year ago, in response to the recommendations of a 
committee appointed to study the problem of the recognition 
of faculty counseling, the Bureau was given the responsibility 
and privilege of making formal recommendations, as to in¬ 
creases in regular academic salary and rank for Faculty Coun¬ 
selors. The exact statements in Provost Griffith’s letter of 
March 23, 1949, are sufficiently noteworthy to deserve quota¬ 
tion: 

This whole problem has been studied recently by a special 
committee appointed for the purpose. I am approving the 
recommendations of this committee as follows: 

1. Policy. A positive program of counseling services to stu¬ 
dents based on the best clinical and guidance practices 
has become and should remain an integral part of the edu¬ 
cational experiences we offer to students, The persons who 
do this type of work well should be rewarded for it and 
advanced in rank and salary in proportion to their excel¬ 
lence. 

4. Rank and Salary. Recommendations for changes in rank 
and salary of personnel listed in the budget of the student 
Counseling Bureau, insofar as counseling services are 
■concerned, will originate with the Director of the Bureau 
and college offices to the general administration, 



J24 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

These significant forward steps in student personnel practice 
seemed to deserve a very careful consideration of any recom¬ 
mendation for increases in rank and salary that would be made, 
Consequently, within the past several months the problem of 
improving the Director’s evaluation of the counseling effective¬ 
ness of individual counselors was presented to the group. It 
was decided that an Evaluation Committee should be elected 
consisting of two of the faculty counselors and two of the cen¬ 
tral staff members. The general theoretical and practical prob¬ 
lems of evaluating the effectiveness of counseling were con¬ 
sidered democratically but briefly at a general staff meeting, 
The Evaluation Committee then went to work and presented 
a series of recommended evaluation procedures. These were 
then discussed at another general staff meeting. 

The next step consisted in having the counselors check the 
evaluation procedures which had been recommended by their 
Committee. Their checked lists were sent in without signature. 
I should like to report some of the conclusions and recommen¬ 
dations of the evaluation committee: 

I. The committee members agreed on the following state¬ 
ments as starting points affecting all recommendations on 

specific methods: 

1. Though various attempts to evaluate counseling have 
been reported in the literature, the validity of no 
method has been established, 

2. No single method should be used as the sole basis 
of evaluating counseling. 

3. Every method used should be on a trial basis. 

4. Training, supervision, and evaluation are inseparable. 

5. Outcomes of whatever methods are used to evaluate 
counseling may serve as guides to further training of 
counselors. 

6. We do not feel it necessary to recommend specifically 
such obvious, continuous, and informal procedures as 
evaluating counselors for regularity and dependabil¬ 
ity in attending to duties, cooperation in the work of 
the Bureau, private consultation with the Director, 
participation in staff discussion, research, and per¬ 
formance of special duties such as taking part on the 
staff programs, work on committees, and the like. 
We feel that our assignment is to suggest more formal, 
specific, objective, special-occasion procedures to sup¬ 
plement these informal ones. 

7. Morale of the staff and, therefore, of each Counselor 




EVALUATING STUDENT PERSONNEL WORK 


5*5 


is a prime consideration in the selection and applica¬ 
tion of procedures. 

8 . Each staff member should feel free to submit addi¬ 
tional evidence (such as recordings, additional par¬ 
ticipation in fake interviews, etc.) in his own behalf 
and beyond whatever evidence would otherwise be 
used in evaluating his work. 

II. We recommend that the present methods of evaluation 
by the Director be continued, and that the staff con¬ 
sider additional methods as possible supplements. 

III. We recommend to the staff for consideration: 

i. Intake conferences. Staff members would meet in small 
groups on Wednesdays when no meetings of the entire 
staff are scheduled. A central staff member would lead 
the group. Staff members would summarize their work 
since the previous meeting. Specific problems could be 
brought before the group for discussion. The Director 
would divide his time among the groups. 
i. A survey of clients by mail questionnaire. This should 
cover each counselor’s entire client group for a given 
semester, with the client anonymous and the counselor 
identified on the outgoing questionnaire. We would 
suggest that a committee be appointed to make the 
Questionnaire and that the committee include those 
staff members who worked on the similar question¬ 
naire used previously. 

These recommendations received the unanimous approval of 
all Counselors who then suggested that the Questionnaire sur¬ 
vey of several years ago be analyzed further to determine 
whether the type of Questionnaire used would actually dis¬ 
criminate between different counselors. 

These few examples indicate the importance of securing the 
acceptance of the persons involved in any evaluation procedure 
and the importance of attempting to minimize any feelings of 
threat which evaluation would involve. It is assumed that not 
all threatening aspects of an evaluation procedure can be elimi¬ 
nated completely. If one attempts to eliminate all possibility 
of threat, then it is probable that a laissez faire policy will ensue 
which will result in no progress. 

Even though the evaluation of individual counselors in a 
counseling bureau presents the issue of securing the acceptance 
of evaluation in its most critical form, it is still a considerable 
distance removed from the general goal of securing acceptance 
for an over-all evaluation of all student personnel agencies. At 



526 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

least a tentative acceptance of the desirability of an over-all 
evaluative procedure by the persons and agencies who would 
be evaluated would be desirable even before the appointment 
of an evaluation committee such as that suggested in Mr. 
Blaesser’s address of last year. It would be possible, of course, 
for some interested agency, such as the Counseling Bureau, 
which has already carried on a self-evaluative procedure, to 
recommend to the higher administration that such a committee 
be appointed. If such a recommendation were acted upon fa¬ 
vorably, in the absence of prior consultation with the various 
student personnel services, it seems possible that unnecessary 
protests and eventual lack of real cooperation from some of the 
agencies would be the result. 

At Illinois this second step in evaluating student personnel 
work, that is, the appointment of an over-all evaluation com¬ 
mittee, will probably be approached in a somewhat different 
manner. In connection with the authority given the counseling 
bureau to make recommendations with respect to increases in 
rank and salary for faculty counselors there was also appointed, 
at the request of the Bureau, an advisory council. I quote from 
the Provost’s letter again: 

An Advisory Council to the Director of the Student Coun¬ 
seling Bureau is authorized, this Council to be composed of a 
representative of each college and school. Membership in the 
Council will be on a revolving basis with members appointed 
for three-year terms. For the first year, the one-year and the 
two-year and the three-year appointees shall be determined 
by lot. A vacancy will be filled by a staff member from the 
college or school which loses a representative on account of 
the rotating membership. 

In order to set up this Advisory Council, I should appre¬ 
ciate having an early nomination from each dean and director. 

This Council has been meeting with the Director of the 
Counseling Bureau each month during the present school year. 
One of the main problems which has been considered by the 
Council is the effectiveness of the various college registration 
advisory systems. These advisory systems do not appear to be 
of equal effectiveness in all colleges, a condition which has re¬ 
sulted in the publication from time to time of critical editorials 
in the school paper. As a result, the Advisory Council recom- 



EVALUATING STUDENT PERSONNEL WORK 5^7 

mended to the Director that questionnaire appraisal be made 
of the various advisory systems with questionnaires being sent 
to the students affected, the advisors, and to the academic 
deans. The remainder of this paper will consist of a description 
of plans and hopes which the Director of the Counseling Bureau 
now has. 

It is hoped that before any evaluation of the advisory systems 
is actually put into effect it will be possible to secure approval 
for an over-all evaluation of student personnal services. Specifi¬ 
cally, it is hoped that it may be possible to secure the adoption 
of both the general evaluative procedure suggested by Dr. 
Kamm and Dr. Wrenn, and of the one which Dr. Kamm will 
describe to you later today. Securing the adoption of these pro¬ 
cedures or modifications of them will possibly not be an easy 
task. It is one which probably can be accomplished, however, 
provided the various individuals concerned have time to con¬ 
sider the proposals and are given the opportunity of making 
suggestions regarding them. 

The next step would be to recommend to the higher admin¬ 
istration that an over-all Evaluation Committee be appointed. 
This Evaluation Committee should probably consist of repre¬ 
sentatives of all of the various colleges and schools as well as 
representatives from all of the different student personnel agen¬ 
cies including the Dean of Student’s Office, the Office of Admis¬ 
sions, the Health Service, the University Union which carries 
on a broad program of student activities, the Speech Clinic, 
the Housing Division, and the Placement Bureau, and possibly 
student representatives. 

The third step in going about the process of evaluation would 
naturally follow from this second step. It would seem that the 
first task of the Evaluation Committee would be to discuss the 
results of the over-all, general evaluation of student personnel 
services and then to proceed to the problem of making a more 
detailed evaluative study of those services which appeared to 
be most in need of strengthening. The whole problem of criteria 
of effectiveness and efficiency of student personnel services 
would probably concern this Committee for some time. Since 
Dr. Strang will probably discuss with you the limitations of 
various criteria on the basis of which student personnel services 



528 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


might be evaluated, I will not attempt to examine these ques¬ 
tions with you. It seems probable that the list of criteria sug¬ 
gested in the revised brochure "The Student Personnel Point 
of View” published by Dean Williamson’s Committee under 
the auspices of the American Council on Education, as well as 
other more specific criteria, such as those suggested by Dr, 
Aiken in his report to the Fourth Annual National Conference 
on Higher Education in April of last year, would be considered. 

It might be reasonably expected that the Evaluation Com¬ 
mittee, after considering the various criteria and various meth¬ 
ods of procedure which could be used, would refer the problem 
to representatives of each of the different student personnel 
services for further consideration and for recommendations as 
to specific methods and procedures of carrying 011 an evaluation 
program in their own agencies. 

While there are many possible objections to a Questionnaire 
type of appraisal of student personnel service it is one of the 
few practical and not prohibitively expensive means for securing 
some rough estimate of the apparent value of the service. There 
is one point in connection with Questionnaire surveys which I 
feel has not been adequately emphasized and that is that a 
student’s responses to a Questionnaire will necessarily be in¬ 
fluenced by the knowledge which the student possesses, not 
only of the services which are actually available but of those 
which theoretically could be made available. Thus, as part of a 
Questionnaire appraisal of any given service it would seem ad¬ 
visable to supply the student with a description of what services 
might reasonably be expected from any given type of agency. 

From one point of view, at least, it may be fortunate that 
students are not more aware than they are of some of our more 
specifically stated objectives. It might be of considerable inter¬ 
est, for example, to submit to a representative group of students 
in any of our colleges and universities the eleven objectives of 
general education recommended in the report of the President’s 
Commission and to have the students indicate on a simple scale 
the degree to which they felt their general college education 
already had, or seemed to be, in the process of helping them to 
reach these goals. The results could be startling. 

After the more specific evaluation proposals of the different 



EVALUATING STUDENT PERSONNEL WORK 


5^9 

student personnel services had been referred to the Evaluation 
Committee for approval, and after the evaluations had been 
carried out, the fourth step in the process of evaluation would 
then confront this Committee. This fourth step would consist 
in the Evaluation Committee's carefully examining and dis¬ 
cussing the results of the detailed evaluation of each agency and 
of their arriving at a series of specific recommendations which 
would be automatically transmitted to the Director or person 
in charge of the specific student personnel agency. Such a series 
of recommendations should, of course, be influenced by a func¬ 
tional analysis of all student personnel services with the view 
of expanding those services which needed expanding and of 
contracting those which seemed to be over-expanded. This, as 
most of you will recognize at once, is one of the most difficult, 
and delicate, and perhaps even dangerous steps in the whole 
process of evaluation. It is my own experience that practically 
every Director of any student personnal service is firmly con¬ 
vinced that his service would improve immeasurably if it were 
only expanded. This suggests, of course, that the Chairman of 
the Evaluation Committee should be a person of the greatest 
possible diplomacy, wisdom, and ruggedness. In addition it 
would be highly desirable if the college or university Admini¬ 
stration would be able to indicate, within some fairly definite 
range, at least, the total amount of funds which might reason¬ 
ably be expended for all student personnel services, It seems 
possible that a wire recording of the proceedings of the Evalu¬ 
ating Committee at this point could provide some valuable 
research material for determining the extent to which the lead¬ 
ers of various personnel services were actually interested only 
in the welfare of students. 

The next step in the total evaluation process would consist 
of repeated and improved evaluations of the various personnel 
services at regular intervals. This should prove to be a relatively 
easy task if the other steps in the process already mentioned 
have been successfully negotiated 

The final step in going about the process of evaluation might 
then consist of an over-all basic evaluation of the outcomes of 
higher education including instruction. At this point, the Evalu¬ 
ation Committee would probably have to be enlarged to include 



JJO EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

other standing Committees in the university such as the Edu¬ 
cational Policy Committees, the Admissions Committee, and 
others. It seems that any university which has actually reached 
this stage in the evaluation process should have little difficulty 
in securing the large funds necessary for the over-all evaluation 
of their educational program from any one of the national 
organizations which would subsidize research. In addition, that 
college or university should be placed at the top of some role of 
honor which would be devised by the American College Person¬ 
nel Association, 

What has been said can be summarized in a few sentences, 
The way to go about the process of evaluating student person¬ 
nel services is to take account of what we know about people in 
general and to make full use of good democratic administrative 
procedures at every step in the process. If the process of evalu¬ 
ating student personnel services cannot be carried on in this 
fashion a very critical examination of the whole basic structure 
and functioning of the college or university itself needs to be 
accomplished first. 



MAJOR LIMITATIONS IN CURRENT 
EVALUATION STUDIES 


RUTH STRANG 

Professor of Education, Teachers College, Columbia University 

Evaluation is a complicated business. It necessitates (i) 
clarifying goals or objectives; ( 2 ) devising methods and instru¬ 
ments for securing evidence that each of these specific objec¬ 
tives has or has not been attained; (3) gaining information 
about the changes that have taken place in individuals, groups, 
or community; and (4) passing judgment on the “goodness’ 1 
of the changes. An excellent review of the literature was pub¬ 
lished in January, 1949, by Froehlich. 1 

The evaluation of evaluation is still more difficult. This is 
because there are so many kinds of end results and processes 
to be evaluated—the personnel program as a whole, the ade¬ 
quacy of staff, the provision of certain services, the processes 
of counseling and of group work. Moreover, these are evaluated 
for different purposes and on different levels of scientific preci¬ 
sion, For example, a teacher may use information-evaluation 
methods, such as obtaining from students a simple written 
statement regarding the effectiveness of his teaching or holding 
a group discussion of the methods used in the course. These 
suggestions for improving his teaching may be very useful in 
modifying instruction for the better even though they meet few 
of the criteria of scientific evaluation. The effective teacher 
continuously studies his students’ progress toward the definite 
goals in the course. 

Despite its difficulty, evaluation of personnel work is neces¬ 
sary if the college personnel officer is to maintain his status. 
Administrators, the general public and students want to see 
results; they demand proof of the effectiveness of counseling 
and group work. 

1 Clifford P. Froehlich. Evaluating Guidance Procedures. Washington, D. C,: Federal 
Security Agency, Office of Education, 1949- 



53^ EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


With the increased interest in evaluation in every area of 
education, methods of evaluation of personnel work have been 
improved. But because of the difficulty and complexity of as¬ 
certaining changes produced by student personnel procedures, 
there are still major limitations in current evaluation studies— 
in surveys of the program as a whole, in evaluation of different 
services, in appraising various kinds of counseling and psycho¬ 
therapy, and in the evaluation of group work procedures. 

Surveys of the Personnel Program 

Surveys of personnel programs tend to be either anecdotal or 
atomistic. The anecdotal type are valuable in giving glimpses 
of present practice which can be appraised theoretically. They 
fall short of adequate evaluation in being somewhat subjective 
—the investigator may select the aspects that appeal especially 
to him; if his mind-set is critical, he will focus on the unfavor¬ 
able procedures; if his mind-set is favorable, he is likely to note 
the incidents that will create a good impression. Almost every¬ 
one has an unconscious bias that is difficult to recognize and 
control. 

The detailed lists of criteria on administrative leadership, 
provisions and facilities for guidance, and in-service education; 
on the preparation and qualifications of the guidance staff, 
their growth in service; the specialized services available; the 
guidance and informational services available to students; the 
counseling and placement services; follow-up studies; relation 
of guidance to curriculum and instruction; use of community 
resources—this detailed analysis of the program is very useful 
in calling attention to the possible scope of the program and to 
standards in training and performance. It falls short of effective 
evaluation in three important respects: 

i. It is too atomistic—it considers each item separately with¬ 
out much attention to its relative importance and relation to 
other items. For example, in a college in which the faculty- 
student load was very small, the faculty members were selected 
with reference to their qualifications for counseling, and the 
faculty adviser was the key person in the guidance program, 
the need for special personnel workers would be quite different 
from that in a college having a traditional subject-centered 
faculty. 



CURRENT EVALUATION STUDIES 


533 


i. The qualitative aspect is neglected. In two colleges, both 
reporting individual interviews with students, one might have 
interviews of a high quality, while the interviews in the other 
institution might be perfunctory and even detrimental. Simi¬ 
larly, autobiographies might be used in one college to help stu¬ 
dents to gain self-understanding, and in another college they 
might increase the students’ insecurity and anxiety. In one 
college the cumulative records might be kept up to date and 
used much more effectively than in another institution. The 
check list or scale type of evaluation does not supply data on 
the important qualitative aspects. 

3. The effect of the qualifications and services on the students 
is not known; in other words the crucial question of evaluation 
is not answered, namely, "Do the procedures we believe to be 
effective really make desirable changes in students, in groups, 
and in the community?” 

In studying the personnel work in a college, little progress 
has been made in defining concretely the changes that should 
result from an effective personnel program. Last year at the 
annual convention, one large group pooled their opinions on 
this subject and listed specific changes in students’ behavior and 
attitudes, faculty cooperation, group activities, and in the com¬ 
munity, which they thought should be the outcome of person¬ 
nel work. 

Evaluation of Different Services 

Educational and vocational guidance are two services that 
have most frequently been subjected to evaluation. Much dis¬ 
satisfaction has been expressed regarding the usual criteria of 
success of vocational guidance—number of positions held, 
length of time positions were held, reasons why person left the 
position, reports by employer of worker’s proficiency and job 
satisfaction of worker. Obviously, a combination of these cri¬ 
teria is more satisfactory than any single item. In his evaluation 
of the State Consultation Service at Richmond, Virginia, Froeh- 
lich 2 moved toward a more adequate combination of criteria— 
criteria of occupational adjustment and personal adjustment, 
the client’s attitude toward the counseling service and change 

8 Clifford P. FroehlicK. “Toward More Adequate Criteria of Counseling Effective¬ 
ness,’’ Educational and Psychological Measurement, IX (.1949)1 1 SS -& 7 - 



534 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

in occupation, and his preparation for the job. Admirable as this 
effort is to obtain the most accurate opinions and to apply sta¬ 
tistical methods as a test of the reliability and validity of the 
ratings, it has certain important limitations, clearly recognized 
by the investigator: 

1. The agreement between the interviewer’s and counselor’s 
rating for occupational adjustment was not as high as desired. 

2. Some of the questions are ones on which the client would 
not be expected to have much basis for judgment, such as the 
relative value of different counseling procedures, especially as 
the client’s attention was not focused on the process. 

3. The interviewer’s basis for rating the client’s adjustment 
was meager, 

4. Much more information is needed about the individual’s 
capacity and the environmental conditions that might make 
vocational and personal adjustment either easy or difficult for 
him, overriding, as it were, the effect of the counseling service 
per se. 

A much more specialized aspect of evaluation of the college 
advisory system is to be presented at this meeting by Frieden- 
berg. This represents an ingenious and detailed attempt to 
have the recipients of the service evaluate faculty advisers. 
From such an evaluation the faculty adviser can obtain many 
helpful suggestions for the improvement of his services. It 
clarifies the areas in which the faculty adviser can best work, 
and indicates the need for specialized services. The same limita¬ 
tion as was mentioned in the preceding study holds here, 
namely, the students’ inadequate basis for evaluating a process 
in which they have had so little background of experience or 
study. However, the concrete cases do give the student an op¬ 
portunity to focus attention objectively on the counseling proc¬ 
ess. After having obtained this information the problem of 
appraisal is still unsolved: Who is right—the student or the 
person who has studied counseling and psychotherapy? 

Evaluation of Psychotherapeutic Counseling 

Considerable work has been done on evaluating the non¬ 
directive interview. Much of this has been along the line of 
showing increased insight on the part of the client as the inter- 



CURRENT EVALUATION STUDIES 


535 


views continue. The assumption is that insights expressed in 
the interview are in themselves evidences of adjustment and 
will affect life adjustment. This assumption has been ques¬ 
tioned. Consequently, evidence of adjustment in life situations 
over a long period of time has been considered the only valid 
measure of the success of the psychotherapeutic interview. 

Even this criterion has its limitations insofar as environmen¬ 
tal conditions may be so destructive as to prevent the good 
adjustment that might have taken place under ordinary con¬ 
ditions. Another limitation is the lack of evidence of the indi¬ 
vidual’s initial capacity for adjustment. If the client’s problem 
is deep seated, persistent, and pervasive, failure to show much 
progress could not be attributed to poor counseling techniques. 

Evaluation of Group Work Procedures 

As in the evaluation of interviews, too much reliance has been 
placed on subjective evaluation of the group work process. 
Some recent studies, however, have obtained reports from the 
participants themselves and from those who have had an op¬ 
portunity to observe them some months later. For example, 
Lippitt 3 obtained evidence of actual change in the performance 
of leaders who had spent two weeks in a workshop that featured 
group discussion, role-playing in sociodrama, and interviewing. 
Both outside observers and the members of the workshop re¬ 
ported that because of the workshop they were able to do more 
effective work with their community groups. 

Ehe College Evaluation Officer 

A new position seems to be emerging in colleges and universi¬ 
ties. This is the college evaluation officer, with training in meas¬ 
urement and evaluation. This work is closely related to, and 
has often grown out of, the research function of the personnel 
department. Such an officer was described by Findley in a 
meeting of the American Educational Research Association. 
This officer would render valuable advisory service to the fac¬ 
ulty in defining objectives, developing instruments to measure 
them, assisting in the collection of data, and appraising and 
interpreting the information collected. 


s Ronald Lippitt. Training in Community Relations; a Research Exp/oration Toward 
New Group Skills. New York: Harper and Brothers, 1949. 



536 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Summary 

The major limitations in evaluation studies seem to be: 

1. Failure to define the outcomes of personnel work con¬ 
cretely as desirable measurable changes in students, faculty 
members, groups, and community. 

2. A too narrow approach instead of a comprehensive study. 
All of the approaches that have been used in evaluating guid¬ 
ance procedures have some value. We need to know about the 
staff and the procedures being employed; student opinion and 
expert opinion as to the effectiveness of the procedures are help¬ 
ful; follow-up studies supply essential information on life ad¬ 
justment, The intensive study of specific techniques and the 
control-group and within-group experimental methods also 
contribute to our understanding of the effectiveness of student 
personnel work 

3. Mass rather than individual treatment of the data col¬ 
lected. Instead of studying the data collected as a group, an 
appraisal of each student should be made individually in the 
light of his previous progress. This is the case-study approach 
to evaluation, It seems to be the only adequate way to appraise 
changes in students. It enables the investigator to take into 
account tile student’s capacity for adjustment to college and 
environmental conditions that may be reinforcing or defeating 
the college personnel program. A case study is made of each 
student; these records are studied individually and a judgment 
made of the student’s social, emotional, physical, and intellec¬ 
tual development. These judgments may then be treated sta¬ 
tistically and checked as to reliability and validity, In the case 
study approach to evaluation the service and the research 
functions of student personnel work come together; one rein¬ 
forces the other. 



an inventory of student reaction to 

STUDENT PERSONNEL SERVICES 

ROBERT B. KAMM 
Dean of Students, Drake University 

Introduction 

Increasingly, we are becoming aware of the need for evalu¬ 
ation of our student personnel programs. Now that the peak 
veteran enrollment has passed and we are faced with somewhat 
declining enrollments and the corresponding reduction in in¬ 
come, we need, all the more, to be able to take stock of the 
quality of our services. 

Just a year ago, considerable time was spent at this conven¬ 
tion in a discussion of the evaluation of student personnel serv¬ 
ices. Dean Willard W. Blaesser, then of Washington State 
College and now with the United States Office of Education, 
spoke on the subject “The College Administrator Evaluates 
Student Personnel Work” (i). Dr. John H. Rohrer, Professor 
of Psychology at the University of Oklahoma, presented a paper 
entitled “An Evaluation of College Personnel Work in Terms 
of Current Research on Interpersonal Relationships” (4). 

A comprehensive review of the literature dealing with evalua¬ 
tion was presented by Blaesser. Reference was made to such 
studies as those of Hopkins (3) in 1925, Brumbaugh and Smith 
(2) in 1930, Williamson and Sarbin (5) in 1940, as well as others. 
As the title of his paper indicates, Blaesser dealt with evaluation 
from the point of view of the administrator. 

But what about the student? Does he think what we have to 
offer is of value? Are our various services really functional in 
his college experience? Are we supplying those services which 
really meet his needs? How about securing “consumer reac¬ 
tion” to our student personnel services? 

The above, and other questions, were asked last year at this 
convention. In fact, there was so much interest in the general 
subject of evaluation that the Program Committee has again 


537 



538 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

seen fit to provide a session in which the problem may be dis¬ 
cussed. 

A Student Reaction Form. —For some two or three years now 
Dr. C. Gilbert Wrenn, Professor of Educational Psychology at 
the University of Minnesota, and I have been experimenting 
with a student evaluation form for student personnel services. 
In its various stages of refinement it has been used at a number 
of institutions with limited success. Recently, an “all-out ef¬ 
fort” has been made to eliminate some of the remaining “bugs” 
and we feel that now we may have an instrument which is 
reasonably valid and which can be functional in the evaluation 
of student personnel services. 

Actually, the form has been designed with the thought in 
mind that it might well be used in conjunction with an evalua¬ 
tion form which Dr. Wrenn and I described in an article in 
School and Society in 1948 (6). The earlier form, entitled “An 
Evaluation Report Form for Student Personnel Services” is for 
the use of trained personnel workers and combines judgments 
with regard to institutional philosophy toward the program 
and actual evidence of specific services. The present form, used 
in conjunction with the previous form, should give a compre¬ 
hensive evaluation of a student personnel program, in that 
reactions of both students and the trained personnel worker 
are utilized. 

Often judgments are made, relative to the value of a service, 
on the basis of a few students’ reactions to a question or two. 
The present form is based upon the principle that if several 
pertinent questions about a particular student personnel service 
are asked of a sufficiently large random sample of the local college 
population, a valid indication of the worth of the service to those 
students will be available. 

Sixty questions, five for each of twelve different services, 
comprise tire present form, The twelve services listed below are 
those ordinarily included in any balanced program. All are self- 
explanatory with the exceptions possibly of “Adjustment of the 
Institutional Program to Student Needs” and “Guidance in 
Student Conduct.” The former illustrates the point of view 
that no institution can have an effective student personnel 



REACTION TO STUDENT PERSONNEL SERVICE 


539 


program unless the institution as a whole is functioning in the 
interests of the same student needs that the personnel services 
are designed to serve. The five items in this area provide an in¬ 
dication as to whether or not the total institutional emphasis 
is in this direction. 

“Guidance in Student Conduct” is so stated in an attempt 
to place a particular emphasis on discipline. This emphasis is 
a counseling and learning emphasis in which students respond 
to items which indicate their sense of the justice of disciplinary- 
procedures, and the extent to which discipline is a learning 
experience. If the policies relating to student conduct are con¬ 
sistent with the belief that each student who violates a regula¬ 
tion should be counseled and helped to learn from the experi¬ 
ence, with punishment following only (i) when punitive action 
seems necessary for learning and (2) when necessary to restrict 
in the event no learning seems possible, then discipline can be 
a personnel service. 

The five items in each area have been designed with the 
thoughts of achieving (1) the maximum coverage and (2) the 
best possible representation of the service, using a minimum 
number of questions. The items have been reviewed with the 
above in mind by various trained workers in the student per¬ 
sonnel field. 

The twelve services and a sample item for each follow: 

Recruitment and Admissions 

Do you feel that, previous to your admission, representatives 
of this institution adequately explained to you the facilities 
of this campus? 

New Student Orientation 

Do you think that this institution made you as a new student 
feel a part of it and of its activities? 

Counseling Services 

Do you feel that students on this campus who most need 
counseling are receiving such help? 

Health Services 

Are you satisfied that your campus health authorities would 
handle your case competently, in the event you were injured 
or became seriously ill? 



^40 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Housing 

Do you feel that this institution is making sufficient effort 
to improve student housing facilities? 

Food Services 

As a rule, do you feel satisfied with the food served you at 
the campus cafeteria or dining hall? 

Extra-Class Activities 

Do you feel that there are enough student organizations and 
activities on the campus to meet the different needs of stu¬ 
dents? 

Adjustment of the Institutional Program to Student Needs 

Do you feel that your total college or university experience 
is such as to better prepare you for intelligent citizenship? 

Student Financial Aids and Part-Time Employment 

If you were “financially on the rocks,” would you feel free to 
go to the campus financial aid service for help and counsel? 

Placement Services 

Is your placement office making sufficient effort to keep you 
informed of current employment trends and needs? 

Student Personnel Records 

Are you of the belief that you are welcome to discuss with a 
counselor all matters contained in your student personnel 
folder? 

Guidance in Student Conduct 

Will a student on this campus get a chance to explain his 
side of the case if he is “called up” for discipline? 

The sixty items as they appear in the form have been ran 
domized in order to minimize bias and to insure a maximur 
chance that each item will be answered independently. 

Administration of the Form .—It is recommended that a ran 
dom sample of at least 200 students of the local college c 
university population be utilized in any study involving the us 
of this form. In order to determine the needs of various group 
on campus, participants in the study are asked to check thos 
of the following which are appropriate for them. 



REACTION TO STUDENT PERSONNEL SERVICE 


541 


Male -Freshman 

Female-Sophomore 

■-Upperclassman 

-Transfer Stu¬ 
dent 


Major Department 


Live Off-Campus in 
Rooming House 
Live at Home 
Live in College Dormi¬ 
tory 

Live in Fraternity or 
Sorority House 


If one is to have a sufficiently large N from which to form 
judgments when considering any one of the above areas, it is 
necessary to have a reasonably large sample with which to 
begin. 

Participants in the study may indicate “Yes,” “No,” or “?” 
in answer to.each of the sixty questions. All items are so worded 
that if the service is functioning properly in the judgment of 
the student the “Yes” will be checked. If the service is inade¬ 
quate, the “No” will be indicated. 

The “?”'is meant for use only in those cases where the stu¬ 
dent has insufficient knowledge of (or experience with) the 
service to make a “Yes” or “No” response. If an informed 
judgment of the adequacy of a service cannot be made, then 
use should be made of the “?”. 

Students are not asked to write their names on the form— 
only to answer the questions honestly and thoughtfully. 

Scoring of the Form .—A Tally Sheet is provided which allows 
for (1) the tallying of responses to each item and (2) the group¬ 
ing of these item responses for each of the services. (Each of the 
twelve services has a maximum “Yes” score of 500 for each one 
hundred students who participate in the study.) 

Following completion of tallying, numbers of “Yes,” “No,” 
and “?” responses should be converted to percentages, using 
as the base N the total number of students participating. If one 
is considering only the responses of a sub-group, the number of 
students in that group should be used in computing the per¬ 
centages. 

If one wishes to consider only the “Yes” and “No” responses, 
i.e., only the definite judgments relative to the adequacy of the 
service, then one will need to use varying N’s in computing 
the percentages, due to the probable variation in “?” responses 
for the twelve service areas. 



54^ EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Interpretation of T)ata. —One can assume that the higher the 
percentage of “Yes” responses for a particular service, the more 
adequate that service likely is in the judgments of the students. 

It is suggested that the services be regarded as adequate if the 
“Yes” responses approximate two-thirds or more of the total 
responses. (This is an arbitrary figure and a lower or higher 
percentage may be used if desired.) If less than two-thirds of 
the participants believe that the service is adequate, in terms 
of the five aspects of the service represented by the five items, 
then that service should be examined. 

The “ ?" responses are to be used when there is a lack of 
familiarity with the service. The presence of even a low per¬ 
centage of such (15 per cent or over, let us say) indicates the 
need for better lines of communication to the students. Often 
students are poorly informed as to the existence of the services 
that are provided for them. 

The presence of a considerable number of “ ?” responses 
should not be interpreted to mean inadequacy of the service 
itself, but, rather, to be indicative of the need for a program 
of selling and of informing students of the services available. 
Actually, to have a strong program of student personnel serv¬ 
ices means little unless the various aspects of the program are 
known and are functional in terms of meeting student needs. 

It is to be expected that underclassmen will indicate a lower 
percentage of “Yes” responses in the area “Extra-Class Ac¬ 
tivities” than will upperclassmen. Acquaintance with, and op¬ 
portunity for participation in extra-class activities, generally 
increase the longer one is on campus. 

Likewise, the percentage of “Yes” responses in the area 
“Placement Services” should be greater for upperclassmen. 
This service is especially designed for those approaching gradu¬ 
ation and has less meaning for underclassmen. A high percent¬ 
age of “?” responses should be expected of underclassmen in 
this area. 

The evaluator may wish to compare the percentages of 
“Yes,” “No,” and “?” responses of one group on campus with 
those of another (for example, dormitory personnel with off- 
campus students). By utilizing appropriate tests of significance, 
one can be confident that differences found to be statistically 



REACTION TO STUDENT PERSONNEL SERVICE 


*43 


significant are real and not the result of chance errors of sam¬ 
pling. With such evidence at hand, one’s conclusions will have 
greater meaning than they would, were there no statistical 
treatment of the data. 

Finally, in the interpretation of data, it is well to keep in 
mind the goals and particular emphases of the institution. If, 
for example, the college or university provides a limited budget 
for a particular service or for the entire organized student per¬ 
sonnel program, then it is probable that there will be a definite 
ceiling on the percentage of “Yes” responses for that service or 
program. Mention is made of the above because of possible 
criticism which may be inappropriately directed at certain 
capable student personnel workers who have inadequate pro¬ 
grams due to insufficient institutional support. On the other 
hand, one must always be objective and critical of any low 
“Yes” response and examine carefully the service to see if the 
maximum is being achieved within the framework and limita¬ 
tions provided by the institution. 

Summary 

In order to ascertain the worth of a product it is well to 
question the consumer of the product. Such is true with regard 
to student personnel services. Accordingly, a student reaction 
form, containing sixty questions, five each for twelve commonly 
accepted student personnel services, has been devised. Through 
study of the proportions of favorable and unfavorable responses 
to the questions asked, one can determine certain program 
strengths and weaknesses, insofar as students are concerned. 
Use of the present form also permits one to secure data relative 
to the institution’s success in actually making known to stu¬ 
dents the student personnel program it offers, 

REFERENCES 

1. Blaesser, W. W. “The College Administrator Evaluates Student 

Personnel Work.” Educational and Psychological Meas¬ 
urement, IX, Part II (1949), 412.-42.8. 

2. Brumbach, A. J. and Smith, L. C, “A Point Scale for Evaluating 

Personnel Work in Institutions of Higher Learning.” Re¬ 
ligious Education, XXVII (1932) 230-235. 

3. Hopkins, L. B. “Personnel Procedure in Education.” Educational 

Record Supplement , No. J. Washington: American Council 
on Education, 1926, 



544 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

4. Rohrer, J. H, “An Evaluation of College Personnel Work in 

Terms of Current Research on Interpersonal Relationships,” 
Educational and Psychological Measurement, IX, Part 
II (1949), 429-443. 

5. Williamson, E. G. and Sarbin, T. R. Student. Personnel Work in 

the University of Minnesota. Minneapolis: Burgess Publish¬ 
ing Co., 1940, 

ft, Wrenn, C. G. and Kamm, R. B. “A Procedure for Evaluating a 
Student Personnel Program.” School and Society, IXV 1 I 
(1948), 2,66-269, 



the measurement of student conceptions 
OF the role of a college 

ADVISORY SYSTEM 

EDGAR Z, FRIEDENBERG 
University of Chicago 

Most colleges and universities provide some kind of counsel¬ 
ing service for students. These services appear to have stemmed 
primarily from two functions: the organization of a student's 
program in such a way that requirements for degrees and for 
admission to professional schools may be met efficiently, and 
the enforcement of regulations deemed necessary by the college 
for the discharge of its responsibilities to students, parents and 
community. In many schools little connection has been per¬ 
ceived between these functions; program-planning occurs at 
registration, under the direction of the Faculty; breaches of 
regulations or aberrant and unsocial student behaviours are 
treated as disciplinary problems by the Dean of Students, or, 
even, separately by sexes in the office of the Dean of Men or 
Women, as in each case is appropriate. 

With the increased influence of psychology on professional 
education (3) has come greater insight into the unity of the 
educable personality (4, 5). As a consequence, the division be¬ 
tween emotional, disciplinary, and academic problems has been 
perceived as unreal (i, 8). Students make vocational choices 
based oil fantasy or emotional tension; students fail in programs 
because of intrapunitive personality trends, hostility to au¬ 
thority, or inferiority feelings; students behave lawlessly out 
of an appetite for punishment which grows on what it feeds on, 
or in acting out fantasies so complex and deep-seated as to 
render disciplinary action, however severe, an extraneous fac¬ 
tor whose meaning is distorted by the same mechanism which 
precipitated the behaviour. Not all students do these things, of 
course, but, unless the admissions policy of a college is so in¬ 
effective or rudimentary as to admit large numbers of students 

S45 



546 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

who are simply too stupid to succeed, or too poor to have time 
to study after they finish their part-time work, it is clear that 
emotional factors must be involved in most of the academic or 
disciplinary problems which do occur, whether these are con¬ 
fined to a small group of students or are prevalent in the student 
body as a whole. 

Even so, however, there remains the fundamental question 
of the degree of responsibility which a college has for the emo¬ 
tional welfare and personality structure of its students, and 
the administrative question of how such responsibility is to be 
discharged, if accepted. It is always possible to set up quasi- 
clerical bodies whose function is to excrete unsuccessful or 
unconforming students. Education is, however, a systematic 
process by which human behaviour is changed in directions 
which the student accepts and the faculty deems good and de¬ 
sirable. To limit the techniques of changing behaviour to those 
which can be applied from the lecture platform, and the effec¬ 
tiveness of education therefore to those who can immediately, 
realistically, and maximally profit by those techniques, seems 
short-sighted and intransigent and, in many cases, cruel. To do 
so with a student body composed in large part of youngsters 
seems irresponsible. 

At the University of Chicago, whose College, as is well known, 
accepts students after the second year of conventional high 
school, no such limitation has ever been considered. There is a 
complete student health service, extending from orthopedics 
to psychiatry. There is a Counseling Center, using client-cen¬ 
tered techniques (6), to which any student can turn, without 
charge, for assistance in “thinking through” questions with 
which he is concerned. There are conventional vocational guid¬ 
ance services. There is not, since all University facilities are 
thought of as contributing ultimately to intellectual develop¬ 
ment, a University Mortician; one can only say in defense of 
the lacuna that few students develop a need for the services of 
such an official while in residence and none has ever applied 
for them. 

Within the College of the University, and peculiar to it, is 
also the College Advisory System. This consists of a staff of 
approximately (the number varies slightly from year to year) 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 


547 


twenty advisers in the College, usually devoting from one- 
fourth to one-third of a full-time assignment to the advisory 
service, and carrying a case load averaging 120 students. While 
it cannot be said that a systematic philosophy of advising 
underlies the system, it has adhered to certain principles since 
its inception. One of these is that all advisers shall be primarily 
members of the College Faculty, devoting their major effort to 
teaching or research. The purpose of this is to insure familiarity 
with the operations of the College, so that the adviser may dis¬ 
charge his administrative functions accurately. Another is that 
students need not be assigned to an adviser of the same sex; 
the purpose of this policy being to dispel the atmosphere of 
obsession with the erotic which has characterized many student 
personnel services of a more conservative orientation. A third 
is to assign each student, so far as possible, to an adviser with 
special qualifications in his intended field of professional or 
academic specialization; but since students in the College have 
virtually no opportunity to modify their programs of general 
education so as to contribute directly to their vocational goals, 
this policy has been modified so that students are not assigned 
special advisers until they are near to the completion of their 
work in the College or have made definite plans for advanced 
study or professional training. Students are not allowed to 
choose their adviser, but, are usually, on their request, re¬ 
moved from the list of the adviser to whom they have been 
assigned and placed on the list of the adviser whom they prefer, 
or whose special academic field is the one in which they are 
most interested, if he has room for them. 

New students normally meet their advisers for the first time 
at a twenty-minute registration conference at the opening of 
the year; students are admitted to the College only at the 
opening of the Autumn Quarter. At this time the adviser 
registers the student for an entire year, and files, without dis¬ 
cretion, on the basis of placement-test results, the student’s 
program for the Bachelor’s degree. No administrative device 
has succeeded, despite much thought and worry, in making 
these conferences anything but rushed and unsatisfactory; an 
inept adviser can, during registration, infuriate, confuse, or 
frighten as many as sixty new students, although the average 



548 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

is doubtless somewhat below this number. After registration, 
students may sign up for a fifteen-minute appointment with 
their adviser at any time they wish, for any reason they wish, 
during the period of eight to ten hours per week which the ad¬ 
viser allots for the purpose. They may also be summoned to see 
the adviser at his discretion, almost always to discuss academic 
problems. The adviser’s signature must be obtained to any 
change of registration initiated by the student. 

It may be seen, therefore, that the College Advisory System 
operates in an almost totally academic context. In a school 
which admits only intellectually qualified students, and pro¬ 
vides fairly generously for assistance to those who need it, most 
academic problems, however, seem to originate in a disordered 
perception by the student of his situation and responsibilities, 
accompanied, of course, by the underlying anxieties and regres¬ 
sions which give rise to the need to misunderstand. There is a 
question, then, as to how much insight into the emotional 
origins and significance of academic problems a subject-matter 
specialist can be expected to acquire in order to be most helpful 
in solving them. But there is a deeper and more controversial 
question than this on which the responsible adviser must take 
a position. In every college there are a number of students 
whose academic success is enhanced, rather than hindered, by 
aspects of their personality which seem likely to result in great 
ultimate unhappiness. There are students who use preoccupa¬ 
tion with abstract theoretical material to distract themselves 
from personal and social inadequacies. There are students who 
seek academic distinction in order to flaunt it in defiance of a 
culture which they believe to disparage it. There are students 
who are convinced that they can only be valued because of 
their scholastic achievements, and who are ceaselessly driven 
to seek grades as copper tokens to exchange for affection at a 
very unfavorable rate. What is the responsibility of the adviser 
for the welfare of such students? Must he train himself to rec¬ 
ognize them? If he can recognize them, should he seek to ini¬ 
tiate personality changes which will probably make the stu¬ 
dent’s academic record less spectacular, even if they also result 
in greater happiness and ultimately greater productivity and 
creativeness? 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 


The answer to such questions depends on a complex hierarchy 
of values, which certainly cannot be established by empirical 
investigation alone. It is clear, however, that student expecta¬ 
tions of the Advisory System are one of the factors which must 
affect the decision. No administrator can build an advisory 
service in response to student demand, which is always partially 
conflicting and made in partial ignorance of the administrative 
limitations of the particular situation. If, however, a certain 
kind of service is believed by students to be a responsibility of 
the Advisory System, although no administrative provision is 
made for it, a situation which will engender hostility, and which 
is dangerous if the service is important, exists. On the other 
hand, if students are convinced that a particular kind of serv¬ 
ice is not the responsibility of the Advisory System, and would 
not seek it there even if it were offered, that service can prob¬ 
ably not be offered to students effectively within the System, 
particularly if it is a counseling service which must, ultimately, 
always be voluntarily received. 

The author, therefore, sought to develop an instrument which 
would measure four things: (i) student opinion of the scope 
desirable in the College Advisory System; (2) student informa¬ 
tion about the system as it actually exists, to permit an estimate 
of the degree to which criticism and opinion might be regarded 
as informed; (3) student evaluation of the effectiveness of the 
System in solving certain problems which it recognized as pos¬ 
sible sources of weakness in itself; and (4) an indication of the 
kind of role with respect to themselves students believe an 
adviser should play in assisting in the solution of certain com¬ 
plex problems. Since this information seems to be among that 
which would be needed by any college in evaluating its advisory 
services, the instrument used to gather it will be described and 
illustrated in some detail. (Copies of the complete instrument 
may be obtained from the author on request.) It consists of a 
group of five batteries of objective questions, with space pro¬ 
vided for additional focussed written comment by students; 
the entire instrument requires something under two hours for 
most students to complete. The first battery consists of nine 
questions which elicit only vital statistics—age, position in the 
college, frequency with which student consults adviser, etc. 



jjo educational and psychological measurement 

Because of the unique mode of organization of the College, 
few of these questions would be applicable intact to other situa¬ 
tions, and they will not be reproduced here. Since IBM electro¬ 
graphic answer sheets were used, questions were numbered so 
as to facilitate analysis, and the next battery began with item 
16. Most of it is reproduced, as follows: 

Below you will find listed certain problem situations which 
are encountered with varying degrees of frequency among Col¬ 
lege students. Among the resources to which a student at the 
U. of C, might turn for assistance with each of these problem 
situations is his College Adviser. In considering each problem 
situation, feeL free to draw on your own experiences with the 
College Advisory System, or other information which you 
believe to be valid, but try in every case to give a reasonably 
generalized response, based on your conception of the system 
as a whole. For each of the situations listed, on your answer 
sheet blacken space 

A. if you believe the College Adviser to be the best person 
from whom to seek help in such a situation. 

B. if you believe that the College Adviser would be the best 
University staff member from whom to seek help in such 
a situation, though probably less effective than experts 
available elsewhere (e.g., a private psychoanalyst or firm 
specializing in vocational placement). 

C. if you believe that the College Adviser might be of some 
help in such a situation, and that you might go to him if 
you had special respect or friendship for him, but be¬ 
lieve that there are other more appropriately trained and 
chosen University officials who could be of greater assist¬ 
ance. 

D. if you believe that some University official should be 
available to help in such a situation, but that a College 
Adviser , either because of deficiencies in training, insight, 
or interest, or because his responsibilities are divided 
between the student and the institution, might be an 
indifferent or even dangerous source from which to seek it, 

E. if you cannot conceive that the University has any respon¬ 
sibility to help a student with such a problem, and do 
not believe that this student should seek help from any 
University official. 

PROBLEM SITUATIONS 

16. Student is fearful of failing his comprehensive examina¬ 
tions, even though he has been working and has made 
passing grades in the Autumn and Winter Quarters. 

17. Student must work to remain in school, and finds that in 
order to clear enough time to keep a job, he must petition 
to get into sections of classes that are listed as closed. 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 5 

18. Student has stolen an automobile and later abandoned it. 
He has not been detected, but fears that he may be, and 
anxiety is disrupting his work and his life. 

19. Student is making mostly C’s, with an occasional D and 
still less frequent B. The death of his father makes it im¬ 
possible for him to continue in school without substantial 
financial aid. 

20. Student cannot bring himself to study; if he sits at his desk 
and attempts to do so, his mind wanders off into day¬ 
dreams. If ne attempts to write a required paper, or other 
written exercise, the blocking is particularly intense. 

21. Student wishes to enter medical school in the shortest 
possible time, and wants help in planning his program of 
studies most efficiently. 

22. Student is uncertain whether the qualifying examination 
in Humanities^ 1 (Special Art) can be taken as part of a 
sequence culminating in Humanities 3 (German) in fulfill¬ 
ment of the requirements for the A.B. degree, and if so, 
whether Language 1 is still a requirement or not. 

23. Student has gotten into serious difficulty as a consequence 
of sexual relations, and is now in a state of panic at the 
prospect of having to choose between an undesired mar¬ 
riage or exposure and parental discipline. 

24. Student, not living in a residence hall, has participated 
in a group which went to a Gerald L. K. Smith meeting 
to break it up. Eggs were thrown, and the student is now 
being held by the police. 

25. Student has a mild interest in becoming a lawyer, which 
is in accord with his parents’ wishes. He is not certain 
that his interest is very real, or that he has the pattern 
of abilities which lead to success in this field, and is begin¬ 
ning to feel anxious. 

26. Student is troubled with severe headaches, of undeter¬ 
mined origin, which are making it impossible for him to 
study and causing him to fail his work. He notices that 
they are followed by periods of listlessness and depression. 

27. Student has purchased a portable typewriter from a store 
in the University community, and signed an installment 
contract to pay for it. He has found several mechanical 
defects in the machine, and wishes to return it and get his 
money back. The store, however, threatens to sue him 
for the balance of the money. 

28. Student does not understand the process by which his 
placement has been made and wishes to have the meaning 
of his placement scores explained to him, as he feels he 
should have been excused from Mathematics 1 and Social 
Sciences 2. 

29. Student has developed a very strong emotional attach¬ 
ment to his roommate, who is now no longer willing to 
“pal around” with him as he did at first. The roommate has 



552 . EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

requested a change of room assignment, and the student 
is troubled by suicidal impulses, and terrifying dreams in 
which he is murdered by his former friend. 

The reader will doubtless grant that nearly every type of 
problem is represented in the battery, from the purely academic 
to the highly aberrant and clinical. These last were included, 
not because an adviser is likely to encounter them but in order 
to permit students to express the most extreme demands pos¬ 
sible on an Advisory System if they wished. 

The next battery—the only portion of the instrument to 
which a right answer “key” in the usual sense of examining is 
possible—consisted of twenty true-false statements about the 
Advisory System. Examples are “Penalties may be invoked to 
compel a student to register for those College courses which 
his adviser recommends that he take during a particular year,” 
“Most College advisers carry a case load of approximately fifty 
students,” “College advisers receive special training in the real¬ 
istic handling of the emotional problems of students.” 

The fourth battery, consisting of 15 questions, would be 
adaptable to almost any academic situation, and is reproduced 
below in its entirety. 

In the College Advisory System, as in every administrative 
structure, the performance of the functions characteristic of 
that system is limited by problems of facilities and procedures. 
Sometimes these limitations can be overcome by ingenuity 
and special techniques; often they persist as sources of dissatis¬ 
faction to staff and clientele alike. 

Below you will find listed a series of such limitations which 
you may or may not feel apply to the College Advisory Sys¬ 
tem. In considering each limitation, feel free to draw on your 
own experience with the College Advisory System, or other 
information which you believe to be valid but try in every 
case to give a reasonably generalized response, based on your 
conception of the system as a whole. For each of these, on your 
answer sheet blacken space 

A. if you feel that this problem is almost always satisfac¬ 
torily overcome by the College Advisory System, or is 
one with which it should not be concerned anyway. 

B. if you feel that the problem is often satisfactorily over¬ 
come by the College Advisory System, but is neverthe¬ 
less the source of occasional annoyance. 

C. if you feel that the problem is recognized by the College 
Advisory System, but is mishandled about as often as it is 
solved, or has been solved by halfway measures. 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 


D. if you feel that the problem is one which may usually 
be expected in contacts with the College Advisory Sys¬ 
tem, although you are occasionally surprised by success¬ 
ful handling of it. 

E. if the problem is almost always troublesome in student 
contacts with the College Advisory System to which it 
is related, and there is no satisfactory evidence of effective 
attempts to solve it. 

61. Providing of enough time at each interview to permit 
students to complete the business for which they sought 
an appointment. 

ба, Keeping individual advisers close enough to their schedules 
that students need not wait too long for their appoint¬ 
ment, or miss class time because of late advisers. 

63. Finding persons to serve as advisers who are warmly in¬ 
terested in students and their problems, and who know 
their students as individuals. 

64. Keeping the case load per adviser low enough to permit 
advisers to get really acquainted with their advisees and 
their problems. 

65. Keeping student conference material confidential, and not 
revealing it to persons who might use it in damaging ways. 

бб. Knowing accurately the right members of the University 
to whom to refer students with special problems—e.g., 
reading deficiencies, or presumed errors in recording com¬ 
prehensive results—and helping students to get in touch 
with those people. 

67. Providing office facilities which insure as much privacy as 
students need in order to discuss freely with their adviser 
such problems as they wish. 

68. Assigning as advisers persons with sufficient insight into 
the emotional and developmental tasks of young people to 
ready understand what’s going on inside them. 

69. Keeping records sufficiently up-to-date, accurate, and 
available that advisers do not act on mis-information. 

70. Conveying to students an attitude of respect for them as 
people, and conducting interviews with courtesy and gen¬ 
uine friendly feeling. 

71- Getting advisers to shut up long enough to permit stu¬ 
dents to express their own feeling about problems fully. 

72.. Assigning as advisers persons of sufficient maturity that 
they need not “use” students emotionally, by bullying, 
identifying too much with them and their problems, mak¬ 
ing demands on the student for lilting or admiration, or in 
other, more subtle, ways. 

73- Providing advisers sufficiently mature emotionally, to listen 
to any problem students might wish to discuss with them 
without becoming “shocked” or frightened, or attempting 
to impose standards of conduct which the student does not 
accept. 



554 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

74. Scheduling sufficient hours per adviser that students can 
get to see an adviser when they need to, without having 
to wait for attention with their problem unsolved. 

75. Providing sufficient information on "summons” forms that 
students are not caused needless anxiety as to the possibil¬ 
ity that they may be in trouble. 

76. Limiting the scope of the adviser’s activity sufficiently 
that students are not obliged to discuss with him matters 
which are not properly his business. 

The fifth battery, although it contains but five items, is per¬ 
haps the most interesting in the questionnaire. It is intended 
to appraise the role which students think it appropriate for the 
adviser to fill, and consists of fictitious case studies, each of 
which presents a rather serious student problem, followed by 
a choice of five courses of action which the adviser, confronted 
by such a problem, might take. The student is asked to indi¬ 
cate the choice he believes best, and is given space for written 
comments in which to suggest other courses he might judge 
preferable. The items follow: 

91. Student is afraid that he will fail comprehensive examina¬ 
tions in German and Mathematics. In the course of his 
first interview with the adviser, he reproaches himself 
severely for his failure to study, but states that, as soon 
as he begins to try to do so, his mind wanders off into day¬ 
dreams. He is a good jazz musician, and is in demand by 
many of his former high-school friends to lead a small or¬ 
chestra at their social events. When he agrees to do this, 
his parents attack him, pointing out that he has never 
been as smart as his elder brother, that he is wasting his 
time and their money, would probably have a hard time 
succeeding at the University of Chicago in any case, and 
must surely transfer to an easier school if he fails an ex¬ 
amination. 

The boy, as he tells this story, seems much 'hurt and un¬ 
certain, but is inclined to agree with the low estimate 
placed by his parents on his character and intelligence, 
Entrance aptitude test scores secured by the University 
place him well among the upper tenth of applicants ad¬ 
mitted. 

A good adviser would 

A. sympathetically but firmly support the parents’ de¬ 
mands on the boy, advising him to give up the or¬ 
chestra until he is more certain that he can carry 
his school work. 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 


555 


B. tell the boy unemotionally that the decisions must 
be his, but reiterate for him the precise requirements 
for continuing registration in the College. 

C. say only enough to make it clear to the boy that his 
feelings of anxiety, rejection and conflict are under¬ 
stood and accepted. 

D. sympathetically point out that the boy has a right 
to make any decisions about his total program of 
activities which will best satisfy him, while making 
sure that he understands both the conditions under 
which he may continue in school and the real abil¬ 
ities he has been shown to possess. 

E. point out that the key to the situation is probably 
the hostility his parents feel toward him, as shown 
by their desire to underrate him, and his resultant 
fear that, should he succeed, they will completely 
reject him. 

92. Student, an eleventh-grade entrant, seventeen years old, 
has been placed on probation because of a failure to at¬ 
tend required physical education classes. She is also failing 
two of her subjects. The instructor in one of these has 
turned in a sympathetic report, indicating that he be¬ 
lieves the girl to be intelligent and creative, but too much 
burdened by her personality difficulties to accomplish 
much at this time. The other report is aggressively crit¬ 
ical, describing the girl as unkempt and lazy, and declaring 
that she has no place in the College. At the conference to 
which she is summoned, the girl appears shy, nervous, and 
so far as possible, uncommunicative. 

A good adviser would 

A. point out to her in a kindly but resolute way that 
she will surely be dropped from school if she does not 
make a better academic adjustment, and help her 
to schedule her week’s work so that she can begin to 
make effective use of her time. 

B. restate to her, in as neutral a tone as possible, the 
conditions under which her registration may be ter¬ 
minated, but emphasize that the decision must be 
hers. 

C. let her know that he understood that she must be 
feeling threatened and unhappy and express clearly 
a wish to help her understand her own feelings bet¬ 
ter, while pointing out calmly that they must also 
meet the practical situation in which she is involved 
in order to go on working together. 

D. suggest that she drop the course taught by the hos¬ 
tile instructor, and use the extra time to catch up 
on her other work. 

E. point out to her that her unkemptness, laziness, and 



556 EDUCATIONAL and psychological measurement 

uncooperative attitude are quite evidently ways of 
rebelling against authority and are almost certainly 
derived from her feelings about her parents rather 
than from any real aspects of her College situation. 

93. The program of an nth-grade entrant has been erro¬ 
neously prepared by his registration adviser, who checked 
Biological and Physical Sciences rather than Natural Sci¬ 
ences i, 1, and 3, as requirements for his degree. The error 
is noted shortly before the beginning of the student’s sec¬ 
ond year in the College, and the student is notified that the 
requirement has been changed and that he must now take 
the Natural Sciences sequence. The student has not yet 
registered for either Biological Sciences or Physical Sci¬ 
ences, and could not have begun work on Natural Sciences 
1 during the previous year because of poor mathematics 
placement, so that he has not, in fact, suffered as yet by 
the error. He is nevertheless quite upset by the change, as 
he wishes to enter an engineering school, believes that 
Physical Science will serve him in better stead than Nat¬ 
ural Sciences 1, does not want to take an additional 
comprehensive, and is angry about the inefficiency of the 
adviser in making such an error. He comes in to ask that 
the original statement of his degree requirements be kept in 
force, 

A good adviser would 

A. apologize for his carelessness in making the error, 
but point out that since it has not as yet affected the 
student’s program, the requirement should stand 
as corrected, 

B. state firmly that error or no error, the degree require¬ 
ments for nth-grade entrants are uniform and must 
be consistently administered. 

C. note carefully the student's reasons for wanting to 
keep the old requirements in force, then take the 
matter to the Dean of Students in the College, admit 
that the original error was his, and ask the Dean to 
stand behind the old requirements. 

D. himself prepare an amended program for the stu¬ 
dent, reaffirming the original requirement, and send 
a copy of it to the Registrar for recording. 

E. point out to the student that it is irrational for him 
to be angry over an error which has, in fact, done 
him no harm, and try to help him to gain insight 
into the true sources of his annoyance. 

94, An Iith-grade entrant has a schedule which requires that 
he take Physical Education at 1:30. He schedules a con¬ 
ference with his adviser at which he complains, with some 
indignation, that this program is not acceptable to him, 
because it interferes with his freedom of worship. It has 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM £ 

been his custom, since the age of ten, to read a chapter of a 
religious work daily after lunch; if he does not do so, his 
food disagrees with him, and he suffers from bloating and 
heartburn. He believes it to be dangerous to his health to 
take exercise while in this condition, but maintains stoutly, 
and unasked, that this does not bother him at all, since he 
is prepared to meet his Maker at any time. He does, how¬ 
ever, insist that, rather than risk the moral obloquy thus 
involved, he will simply refuse to attend physical educa¬ 
tion classes. There is no way to arrange his schedule so 
that he can either lunch at n 130 or take Physical Educa¬ 
tion then without either petitioning for admission to three 
closed class sections or getting the Physical Education 
Department to make an exception to its rule and let the 
student come two days a week at 11:30 and two days at 
1:30. 

A good adviser would 

A. let the boy go ahead and petition, regardless of the 
improbability that three petitions would be granted 
for such a reason, in the hope that he might change 
his mind when finally confronted with so nearly im¬ 
personal a reality. 

B. attempt to persuade the Physical Education De¬ 
partment that the boy’s emotional need is impor¬ 
tant and real, and that it should make an exception 
in this case. 

C. say neutrally and dispassionately to the boy that the 
University does not recognize this kind of fantasy 
as religious in character, and cannot accommodate 
itself to such diversity of need; tell him frankly that 
if he does not attend compulsory physical education 
classes, he will be removed from the College. 

D. tell the student that it is pretty clear that some fac¬ 
tor besides religious conviction is operating to pro¬ 
duce symptoms of this kind, that the responsibility 
of the University to him and his parents requires 
that it insist he report to Student Health for a com¬ 
plete medical and psychiatric examination, and that 
his program may more profitably be discussed in the 
light of the report which Student Health will rnake. 

E. discuss with the student the religious meaning of 
his position, pointing out that it must derive from 
an unusual conception of God, and suggesting that 
he scrutinize his own emotional needs as the source 
of the conflict. 

95. A twenty-year-old student, who entered the College at the 
I3th-grade level at the opening of the previous, scholastic 
year, is making satisfactory grades, both on his compre- 
hensives at the end of his first year and on quarterly ev- 



8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

aminations. Reports from his instructor in Humanities 2 
and History of Western Civilization commend him for his 
brilliant contribution to discussion, and his evident capac¬ 
ity to integrate the material offered into abstract general¬ 
izations. Reports from his instructors in Biological and 
Physical Sciences indicate that he has hardly ever at¬ 
tended classes in these courses, although he passed the 
comprehensive in Biological Sciences with a grade of C. 

The student’s adviser, in an informal discussion with the 
Head of the residence hall in which the student lives 
learns, however, that the student is regarded by the Head 
as somewhat lacking in emotional adjustment. He has 
taken no interest in House social activities, and, so far as 
is known, has few social interests of his own. His friend¬ 
ships within the House are confined to two other boys, 
with whom he has discussions nearly every night center¬ 
ing on the Marxist interpretation of the motivations of 
contemporary politicians, or the unity and structure of 
contemporary drama, or the nature of reality. He has 
twice been sent back to his room from the dining hall 
because he came in to dinner without coat or tie. 

A good adviser would 

A. do nothing about the situation, on the grounds that 
he has no right to interfere with what evidently 
represents the boy’s free choice of behavior, so long 
as he is academically successful. 

B. summon the boy for a general discussion in the 
course of which he would expect to describe to the 
boy in detail the range of interesting activities avail¬ 
able at the University. 

C. attempt to show the House Head that the behavior 
of the boy might very well indicate more complete 
achievement of the objectives of the College than 
that shown by nominally better adjusted students, 
and urge him to encourage the boy's present mode of 
self-expression. 

D. summon the boy for a conference in which he would 
cautiously attempt to estimate how happy the boy 
really was, and, if considerable anxiety and unhap¬ 
piness were indicated, try to get him to discuss the 
possibility of seeking help from the Counseling Cen¬ 
ter or a psychiatrist. 

E. summon the boy and explain to him that his present 
behavior shows serious maladjustment, is probably 
more the resuLt of his need to rebel against the pat¬ 
terns of middle-class behavior established by his 
parents than of a serious interest in his studies, and 
suggest that he work the problem through with the 
adviser. 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 559 

Detailed results of the administration of the instrument will 
not be presented here, since it is hard to see how they would 
be of more than local interest. A brief account will be given, 
however, as an example of the way the questionnaire may be 
handled, and the kind of results to be expected from it. 

A letter describing the questionnaire, and stating that it had 
been prepared jointly by the Offices of the Dean of Students 
and University Examiner was sent to every seventh student 
on the list of each adviser, requesting him to come fill out the 
instrument at his choice of four specified times. Since indepen¬ 
dent results were wanted, this seemed more desirable than 
sending the instrument to the student, who would, in many 
cases, have then filled it out in consultation with others. At 
the time this was done, the Chicago Maroon, the official stu¬ 
dent newspaper, editors of which had been present at all ses¬ 
sions where the questionnaire had been planned, carried edi¬ 
torials urging student co-operation. 161 students or slightly 
less than half of those who were invited, filled out the ques¬ 
tionnaire. The composition of this sample was scrutinized by 
the Dean of Students in the College, who declared it to be 
adequately representative, so far as crude statistical factors, 
i.e., length of residence in the college, age, sex, level of ad¬ 
mission, etc., were concerned. The sample could not, however, 
have been representative of student attitude, since it is quite 
clear that the large proportion of students who did not respond 
must have felt differently about the Advisory System than 
those who were willing to give it some time. One would as¬ 
sume, in the absence of more specific information, that stu¬ 
dents who felt most strongly about the system, whether posi¬ 
tively or negatively, would be likely to respond, while the 
indifferent would ignore the request; such an inference could be 
checked only by an aggressive interviewing program in which 
contact was established with a good sample of those who 
refused to co-operate. 

The results on the objective portion of the instrument cited 
in this article, for the total group of 104 boys and 57 girls 
responding, are presented in the following table. For items 
16-29, the figures given refer to the number and percentage of 
students marking the item A, B, C, D, or E. For items 31-50, 
the figures are a frequency distribution showing the number of 



560 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

students making various total scores on this twenty-item true- 
false “test” of information. For items 61-75 the same informa¬ 
tion is given as for 16-29,with two additions. These items, as 
the reader may perceive by referring to them, constitute a rat¬ 
ing scale on which students appraise various problems which 
the Advisory System may have met more or less effectively. 
Space A represents a highly favorable appraisal on a particu¬ 
lar item, space B a moderately favorable one, space C a neutral 
or ambivalent one, space D moderately unfavorable, and space 
E highly unfavorable. In order to provide some quantitative 
indication of the relative success of the system in solving these 
problems, the following device was invented. The number of 
students choosing to rate each item A was multiplied by 3; 
the number of students marking it E, by —3. Those marking 
it B were counted in as 1, those marking it D, as — 1, while 
C responses were ignored—multiplied by zero. The total sum 
thus obtained was added algebraically for each item, and the 
number thus obtained is reported as a Derived Score in the 
column D.S. The Rank column simply indicates the rank of 
these scores, a low number indicating a highly favorable stu¬ 
dent response to this aspect of the service. The maximum pos¬ 
sible score would be 3 X T6r, or 483; the minimum —483. 
Astonishingly, but gratifyingly, no negative scores are obtained. 

Similar data have been gathered for six subgroups of the 
population which took the questionnaire. These groups are; 
54 1948 entrants, who had had but a few weeks experience 
with the College; 41 nth- and I2th-grade entrants aged 18 or 
younger; 31 students having been assigned to three or more 
advisers during the course of their college career; 62 students 
answering correctly eleven or fewer of the twenty true-false 
information items; 35 students blackening space E (maximally 
unfavorable) for two or more of items 61-76; and 97 students 
choosing unpopular responses—that is, responses other than 
91D, 92C, 93A or C, 94B or D, or 95D on two or more of the 
“case-study” items 91-95. 

Three different kinds of free responses were sought from each 
student. The first and major source of these was the following 



TABLE i 


Performance of 161 Students Responding to Invitation to Complete the Questionnaire 
Evaluating Their Conception of the College Advisory System 


Item 

N 

A 

% 

N 

B 

% 

16 

57 

35.4 

30 

18.6 

17 

184 

83.2 

4 

2-5 

18 

3 

i -9 

28 

17-4 

19 

60 

37-3 

11 

6.8 

10 

9 

5 - 6 

74 

46.0 

11 

10 6 

63.8 

2 

1.1 

21 

1 53 

95.0 

2 

0.6 

23 

1 

0.6 

17 

10.6 

24 

5 

3 -i 

8 

5.0 

IS 

24 

14.9 

46 

28.6 

26 

0 

0,0 

54 

33-5 

27 

11 

7-4 

10 

6.2 

18 

H 3 

88.8 

n 

1.2 

29 

0 

0.0 

66 

41.0 


Responses 


N 

C 

% 

N 

D 

% 

N 

E 

% 

58 

36.0 

10 

6.2 

5 

3 -i 

!5 

9-3 

6 

3-7 

0 

0.0 

30 

18.6 

47 

29.2 

5 i 

3 i -7 

60 

37.3 

19 

11.a 

9 

5.6 

35 

21 .7 

34 

21.1 

9 

5.6 

40 

24.8 

12 

7-4 

0 

0.0 

5 

3 -i 

0 

0.0 

0 

0.0 

29 

18.0 

43 

26.7 

69 

42.8 

53 

20.5 

39 

24.2 

72 

44-7 

7 i 

44.I 

15 

9-3 

2 

1.2 

45 

28.0 

43 

26.7 

17 

10.6 

44 

27-3 

36 

22.4 

56 

34-8 

12 

7-4 

3 

i -9 

0 

0,0 

33 

20,5 

46 

28.6 

16 

9.9 


Distribution of Scores—Items 31- 

-50 

Score 

N 

% 

O- I 

O 

0-0 

2 - 3 

I 

0.6 

4- 5 

2 

1.2 

6- 7 

8 

5 -o 

8- 9 

11 

6.8 

10-11 

40 

24.8 

11-13 

43 

20.7 

H-15 

44 

* 7.3 

16-17 

10 

6.2 

18-19 

I 

0.6 

Omit 

1 

0.6 

D 

N % 

M = 12.2 cr = 5.04 

Responses 

C D 

N % N % 

N 


6l 

65 

4 °.4 

68 

42.2 

16 

9-9 

5 

3 -i 

6 

3-7 

10 

204 

62 

47 

29.2 

P 

Si.6 

9 

5.6 

h 

8.7 

6 

3-7 

9 

206 

63 

24 

14.9 

60 

37-3 

42 

26.1 

21 

13.0 

IO 

6.2 

13 

81 

64 

21 

13.0 

44 

27 -3 

46 

28.6 

58 

17.4 

17 

10.6 

15 

18 

65 

120 

74-5 

24 

14.9 

5 

3 -i 

4 

2-5 

0 

0.0 

2 

380 

66 

6 7 

41.6 

5 ° 

31.0 

26 

l6. I 

8 

5.0 

4 

2.5 

8 

231 

67 

64 

39-8 

4 i 

25-5 

25 

15-5 

7 

4-3 

18 

II .2 

II 

172 

68 

26 

l6. I 

45 

28.0 

40 

24. 8 

29 

18.0 

14 

8.7 

14 

52 

69 

67 

41.6 

66 

41.0 

13 

8.1 

9 

5-6 

I 

0.6 

7 

255 

70 

98 

60.9 

45 

28.0 

12 

7-4 

4 

2-5 

O 

0.0 

3 

335 

71 

46 

59.6 

49 

3°-4 

5 

3 -i 

5 

3 -i 

4 

1.2 

4 

326 

72 

11 5 

7 i -4 

37 

23.0 

4 

2-5 

I 

0.6 

O 

0.0 

I 

381 

73 

91 

56.5 

40 

24.8 

12 

7-4 

8 

5 -° 

1 

I . 2 

6 

299 

74 

27 

16.8 

64 

39-8 

34 

21 , I 

14 

8-7 

20 

12.4 

12 

85 

75 

31 

19.2 

43 

26.7 

25 

*5 • 5 

14 

8-7 

37 

23.0 

16 

21 

76 

101 

62.7 

24 

14.9 

9 

5 - 6 

2 

1.2 

3 

i -9 

5 

318 

9i 

IO 

6,2 

5 

3 -i 

0 

0.0 

135 

83.8 

5 

3 -i 



92 

22 

13-7 

5 

3 '1 

130 

68.3 

12 

7-4 

3 

i -9 



93 

54 

33-5 

I 

0.6 

67 

41.6 

5 

3-1 

24 

14.9 



94 

IO 

6.2 

49 

30.4 

I 

0.6 

71 

44.1 

18 

II .2 



95 

33 

20, 5 

23 

14-3 

IO 

6.2 

80 

49-7 

I 

0.6 




561 




$6l EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


paragraph, presented at the close of the objective portion of 
the material: 

On this sheet please suggest any specific changes in the Col¬ 
lege Advisory System which you believe would increase its ef¬ 
fectiveness. Feel free to suggest any that seem important to 
you. It is suggested that you center your thinking around such 
possible areas of change as: 

1. Professional qualifications of advisers. 

1. Case load of advisers. 

3. Scope of advisory service—i.e., increasing or decreasing 
the range of kinds of problems with which advisers deal, 

Do you feel that advisers, as they now function, are a 
threat to freedom or privacy of students? Do you, on 
the other hand, feel that they are too much concerned 
with routine academic problems to offer you the help you 
need? What changes would you suggest? 

4. Intercommunications between Instructors, House Heads, 
and Advisers. 

5. Means of establishing the working relationship between 
student and adviser as soon as possible. 

Students were also asked to list any characteristics of the 
Advisory System not included in items 61-76 which seemed to 
them especially worthy of favorable or unfavorable comment, 
and to state any specific course of action which they would 
prefer to any of the 5 listed, with reference to items 91-95. 
These comments have been examined rather carefully, and, so 
far as they are subject to classification, tallied quantitatively. 

Twenty-five (of the 161) students responding to the para¬ 
graph quoted above expressed a feeling that the case load of 
advisers should be limited—the most common single suggestion 
made. Twenty-three felt that advisers should be warmly in¬ 
terested in students, and 17 felt that more attention should be 
given to personal problems of students. Sixteen students felt 
advisers should receive training in adjustment counseling pro¬ 
cedures, or psychology, while five more also felt this to be the 
case if emotional problems of students were really a part of the 
advisory responsibility, but were not quite prepared to concede 
that they were. 

On the contrary, a smaller group of students displayed con¬ 
trary and apparently more intense feeling. Eleven students felt 
that advisory and counseling services should be kept separate, 
or that the adviser should not be concerned with personal prob- 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 


lems, while one student put the feeling on the basis that ad¬ 
visers should not deal with problems which students would 
ordinarily discuss with parents. Fifteen students took what 
might be termed a middle position, viewing the advisory func¬ 
tion as mainly academic, but feeling that advisers should be 
able to direct students intelligently for help when needed. A 
similar and related contrast was apparent in student wishes 
concerning the degree of interrelationship between advisers and 
dormitory and academic staff. Fourteen felt this should be in¬ 
creased, while nine felt this to be undesirable. 

Of particular interest was the extent to which students con¬ 
ceived the Advisory System as playing an important role in 
interpreting the purposes and values of the University of Chi¬ 
cago College Plan to them—a function which, it must be ad¬ 
mitted, was almost completely ignored in the instrument itself. 
This feeling was expressed in a variety of ways, and is there¬ 
fore less conspicuous on the tally than it would have been had 
the attitude found expression in a single, often-repeated senti¬ 
ment. Nine students expressed a direct wish for assistance in 
the synthesis and interpretation of their College learning ex¬ 
periences, Six, expressing less positive feeling, expressed a need 
for more assistance in orienting themselves to the University. 
Six also wanted this help specifically in connection with the 
function of the Advisory System itself, with two expressing 
definitely the feeling that the System should more clearly define 
and state its own purposes and limits. Some anxiety was ex¬ 
pressed at the failure of the College to take students’ vocational 
ambitions adequately into account; five felt that special ad¬ 
visers trained in a professional field, e.g., for pre-medical stu¬ 
dents, should be assigned as needed; four, that more help should 
be given the student in making plans to enter a Division on 
completion of general education. There was surpsisingly little 
complaint about the non-voluntary nature of the system; only 
four students wished to be allowed to choose their own adviser, 
while nine asked that regular meetings be scheduled at inter¬ 
vals, regardless of their felt need, in order to check on their 
progress. 

Few items were added to those in 61—76 of the instrument 
by student commentators, and none by more than four persons. 



564 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Rather curiously, four students stated here a belief that stu¬ 
dents should have an adviser of their own sex; three wanted 
more technical advice about planning for a vocation; three 
felt that advisers should be commended for trying to help stu¬ 
dents with personal problems, and should do more of it; and 
three felt that warmer, friendlier relationships would be desir¬ 
able; two wanted more psychological or clinical training. On 
the other hand, three students felt that advisers should stick 
to impersonal or academic problems, and not pry into others, 
and two, that academic and emotional counseling should be 
kept separate. 

Reactions on the case-study items were, as was expected, 
interesting and revealing. Seventeen students, as might be ex¬ 
pected, recommended referring the boy of item 91 to a source 
of more specialized psychological care, in most cases medical. 
Fourteen, however, wished the adviser to intercede directly 
with the parents to get them to understand the boy better. 
Other recommendations were largely partial or palliative; finan¬ 
cial aid, so that he could live away from home; assistance in 
scheduling, and the like. Four specifically enjoined the adviser 
to follow through on what would evidently be a long and dif¬ 
ficult case. 

On item 92, psychiatric aid, recommended by 27 students, 
was virtually the only cogent suggestion to emerge. Three stu¬ 
dents, however, recommended a stern attitude. 

Perhaps the most striking characteristic of the responses on 
item 93 was their almost uniform hostility. Only five students 
recommended that the adviser attempt to get the consent of 
the University to the maintenance of the existing, erroneous 
agreement as the student wished, which was, indeed, the course 
of action successfully undertaken in a closely parallel case which 
suggested this item. Nine students urged that the adviser po¬ 
litely but firmly require the student to take the Natural Sci¬ 
ences program, Fourteen recommended that the adviser explain 
to the student the advantages of the Natural Sciences sequence, 
and its greater consonance with the objectives of the College. 
Three students recommended an aggressive firmness—one of 
these stating that “a few spankings when he was younger” 
might have helped the student, and another stating that he 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 


should “know when to keep his mouth shut.” As the item, as 
presented, gives no intimation of the personality of the stu¬ 
dent involved—intentionally so, since this item was chosen to 
measure student reaction to a purely administrative situation 
without clinical aspects—this evidence suggests that many stu¬ 
dents in the College are highly identified with its objectives, 
and highly intellectual character, (7) and are inclined to ex¬ 
empt the College Plan in the abstract, though not the staff or 
administration, from duty as a target for rebellion. 

Great, indeed, is the contrast presented by responses to item 
94. While eight students recommend referring the boy to a 
psychiatrist, and three suggest that he be required to conform, 
seven state that the program must be changed, because not to 
do so would infringe on the boy's freedom of religion; five, in 
this case, suggest direct appeal to the administration to insure 
that case is not handled legalistic ally. One student states that 
the adviser “must do everything possible to maintain the boy’s 
faith.” Three suggest that the boy be referred to an official of 
his own church. 

On item 95, as might be expected, most of the commentators, 
as would be expected, were concerned about the possibility 
that the student might be coerced, or that his privacy might 
be unduly invaded, than were concerned about his ultimate 
fate. In the main, they were not hysterically so, and there was 
considerable acceptance of the dangers which such a student 
might be piling up for himself. Nine students felt that inter¬ 
ference of any kind was unjustified, or that the behavior of the 
boy was not peculiar. Five, however, thought counseling should 
be given; three, that the student should be introduced around, 
and six, that his old interests should not be discouraged, but 
that he should be led to develop new ones. It should be empha¬ 
sized with reference to this item, as with the other four in its 
group, that students did not make comments unless they wished 
to amplify or reject the five already available to them in the 
instrument; reference to Table 1 will show that most of the 
students were able to accept one of the positions presented in 
the item. 

What inferences do these data suggest, with reference to the 
questions raised as to the scope and responsibilities of a col- 



$66 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


lege advisory system? Perhaps the most interesting and sugges¬ 
tive is the rational picture which students seem to have of the 
Advisory System and of its limitations. They recognize that 
in a situation providing as varied services as the University of 
Chicago, its function is primarily academic. This would not 
seem to indicate, of course, that students regard the degree of 
insight into the sources of their difficulties which an adviser can 
muster as unimportant; rather, that they do not expect ad¬ 
visers to develop sustained clinical relationships with them. 
Evidence for this comes both from responses to items 16-29 
and from the “case-study” items. 

Nevertheless, many students who are well aware that cer¬ 
tain problems are psychiatric, and that advisers are not psy¬ 
chiatrists, still consider that the University has a responsibility 
to assist them with such problems, and believe the adviser to 
be the most appropriate source to which to turn for aid- 
doubtless as liaison to professional sources. Note particularly 
responses to items 20 and 29. 

Students tend to regard as outside the scope of University 
service their legal problems (note items 18, 24, and ay, and 
perhaps others in which they feel its role would most likely be 
punitive (item 23). They do not tend to regard their emotional 
problems as, perse, outside the scope of University responsibil¬ 
ity (Items 20, 26, and 29). 

Students base their opinions of the Advisory System on a 
fair amount of information. The mean of 12.a out of a pos¬ 
sible 20 seems high, especially in a sample containing 54 1948 
entrants who had been at the University less than a quarter, 
and who made a mean score of 10.9 themselves. 

Responses seem to support a common-sense view of the ad¬ 
visory function. The unanimity of responses on the case-study 
items seems to indicate very little disagreement among students 
as to what they want from advisers, and the comments seem 
to bear this out. They want warmth, understanding and accept¬ 
ance of their goals and purposes. Where necessary, they want 
intercession on their behalf. They do not want advisers to play 
psychoanalyst at them, but it should be borne in mind that an 
an adviser who would do so would not be behaving at all as 
would a real psychoanalyst attempting to help the same indi- 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 567 

vidual. The students do not, therefore, reject the concept of 
therapy, but the possibility of its being used by someone else 
to act out his own problems:—a good thing for anybody to 
reject. Most of them accept as desirable, according to responses 
to item 95, the intercession of the adviser on behalf of an aca¬ 
demically successful but troubled student. Many see dangers 
in this, however, and a few react strongly against it. 

There is a remarkably conservative, and, if one may say so, 
uncritical and middle-class orientation of student values, re¬ 
lated to religious and personal freedom, There is strong, and, 
again, perhaps uncritical identification with the College, its pur¬ 
poses, and what they conceive to be its mores. (2) Evidently, 
students do view the system as a part of the total educational 
service of the institution, and expect its functions to be modified 
in the light of, or perhaps even determined by, the institution’s 
purposes. 

Internal criticism of the inferences made is possible, though 
laborious, by a statistical analysis of differences among the 
sub-groups on relevant items. For example, if one reason the 
data cited fall into the pattern observed is that students view 
the Advisory System realistically, and do not attribute to it 
psychoanalytic functions, one would certainly expect of younger 
and less experienced students that they would make choices 
indicating somewhat more dependence than the rest of the 
group. An examination of item 29, response B, reveals that 
56 per cent of students in the nth and 12th grades, aged 18 
and under, regard the adviser as the most appropriate Univer¬ 
sity official to approach with this highly clinical problem, as 
compared to only 37 per cent of the remaining group. This 
difference is significant at the 5 per cent level, yielding a criti¬ 
cal ratio of 2.1. On the other hand, 22 per cent of the younger 
group choose response D, as compared with 31 per cent of the 
remaining group, which is not a significant difference; this, too, 
is perhaps explicable, since a significant difference on this re¬ 
sponse would indicate positive disillusionment with the system 
with growing independence and maturity, which presumably 
does not occur. On item 95, 10 per cent of the younger group 
choose response A, as compared with 24 per cent of the remain¬ 
ing group, a difference yielding a critical ratio of 2.3, and to 



568 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

be expected in view of the greater independence of the older 
adolescent or young adult. Fifty-six per cent of the younger 
group, as compared with 48 per cent of the remaining group, 
however, choose response D, a difference in the expected direc¬ 
tion, but not significant and not sufficient to constitute evidence 
that the older group repudiates the assistance of the system in 
solving personal problems. 

If the results obtained by applying the instrument described 
at the University of Chicago are representative, then, it seems 
that while students feel that they need warmth and under¬ 
standing and that the University is obligated to provide help 
with personal problems, they are not likely to misuse or over¬ 
burden the source of such help. They will, in general, take as 
much as can be given of what they need. The more psychologi¬ 
cal insight which the Advisers in a system possess, and the more 
clearly the system defines its scope to include service with 
personal problems, the more students will expect of it and use 
it. Some, however, will become frightened and hostile, and 
most expect enough initiative to be left to them to permit them 
to feel respected, rather than manipulated. 

REFERENCES 

1. Bios, Peter. The Adolescent Personality. New York: D. Appleton 
Co., 1941. 

а, Kelly, Janet A. College Life and the Mores. New York: Bureau of 

Publications, Teachers College, Columbia Univ., 1949. 

3. ICrugman, Morris. “Orthopsychiatry in Education.” Orthopsy¬ 

chiatry , 1923-1948, Lawson G. Lowrey (Ed.). American 
Orthopsychiatric Association, 1948. 

4. Munroe, Ruth L. Teaching the Individual. New York: Columbia 

University Press, 1942, 

5. Rauchenbush, Esther. Psychology for Individual Education. New 

York: Columbia University Press, 1942. 

б, Rogers, Carl R. Counseling and Psychotherapy. New York: Hough¬ 

ton-Mifflin and Co. 1 1942. 

7. The College of the University of Chicago. If You JVant an Educa¬ 

tion, n. d. (Public statement, released I949). 

8. Zachry, Carolyne B. Emotion and Conduct in Adolescence. New 

York: D. Appleton Century Co., 1940. 



THE ROLE OF STUDENT GOVERNMENT IN THE 
STUDENT PERSONNEL PROGRAM 


BROTHER LOUIS 

Dean, St. Mary’s College, Winona, Minnesota 

There are so many different interpretations attached to the 
term, student government, that it would seem almost necessary 
to open this discussion with a definition of it. However, I am 
going to sidestep that responsibility, and hope that my defini¬ 
tion of student government will gradually be recognized from 
what I have to say about it, I doubt that a concise definition 
could be given which would not be subject to various interpre¬ 
tations. And, so, for the present, by way of preliminary explana¬ 
tion but not as a complete definition, I will say only that when 
using the term, student government, I have in mind a student 
organization composed of the highest elected officers of the 
student body, having very definite and real responsibilities for 
all student life and student activities on the college campus, 
and working in close conjunction with faculty, student body, 
and administration. My comments will be directed towards 
three main points: (i) the place that student government should 
have in the total personnel program of the college, (2) the func¬ 
tions it should fulfill, and (3) the conditions that are necessary 
in order that it can effectively carry out these functions. 

Student government must be an essential and integral part 
of the total personnel program of the college because it is the 
one means for accomplishing those aims of the personnel pro¬ 
gram which are related to and achieved by group living and 
group activities. While other areas of college personnel serv¬ 
ices are concerned primarily with the student as an individual, 
the area of student government is concerned with the student 
as a social being, in relation to both the college community 
and the other social environments in which he will live. It is 
the means for unifying all efforts of the college toward the 
education of the student as a social being. Since, then, it is 

569 



57° EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


one of the personnel services, it should have the same recognized 
status, the same prestige, and the same freedom to operate 
within its sphere of responsibility as, for example, the health 
service. It should also, by its very nature, have the same all¬ 
pervasiveness with respect to the whole college program as the 
counseling services. 

This implies that the program of student government comes 
directly within the scope of responsibility of that administrative 
officer of the college who has general charge of all student per¬ 
sonnel services. On most campuses this would be the Dean of 
Students. It also implies that the authority and responsibility 
which the student government has are delegated and not abso¬ 
lute; that is, delegated by the administration to the student 
body to be exercised by the elected officers of that body in 
accordance with a constitution accepted by the student body, 
the administration, and the faculty. If correctly understood, 
this places no real restriction on the student government, since, 
just in the same sense, the authority of the Dean of Students 
or the Dean of the College is delegated and not absolute. The 
crux of the matter is really the good judgment of the higher 
administrative officers of the college, who can either make or 
break the program of student government according to their 
attitude toward it. If restrictions are imposed to such an ex¬ 
tent that there is no possibility that the student officers will 
make mistakes, then the program is doomed to failure. 

Briefly, then, the student government is an integral part of 
the student personnel program of the college, and it has a dele¬ 
gated authority which comes to it through that administrative 
officer who has been charged with the general responsibility 
over all personnel services. 

In order to merit and maintain the status that it should have, 
the student government has several important functions to ful¬ 
fill. The major ones, I would classify as follows: 

i. It should have the responsibility for the operation and 
control of all student organizations of the college campus. 

a. It should have the responsibility for promoting, organiz¬ 
ing, and directing what might be termed “all-college’' 
functions and programs, that is, those which involve the 
whole student body and not just one particular organiza¬ 
tion or group. 



STUDENT GOVERNMENT 


571 

3. It should have a definite responsibility for the formation 
of policies concerning all student life and student activi¬ 
ties of the campus. 

4. It should provide the means for achieving mutual under¬ 
standing and close cooperation between students, faculty, 
and administration. 

Each of these functions needs some explanation. Under the 
first of them, the responsibility for student organizations, would 
come the reviewing and approving of constitutions, the setting 
up of standards, the auditing of books, the supervision of social 
and other affairs of these organizations—such as dinners or 
dances—the education of officers of these organizations, the 
authorization of student concessions, the supervision and con¬ 
trol of student publications and student bulletin boards, the 
fostering of wide student interest and participation in the vari¬ 
ous campus organizations, and so on. Much can be done by the 
student government toward the education of officers and mem¬ 
bers of these organizations in their duties and responsibilities. 
Sponsoring and directing leadership workshops open to all stu¬ 
dents is one, providing consultation services is another, and 
developing brochures giving helpful suggestions is a third. Two 
such brochures which I recently received from Washington 
State College are excellent examples of what can be done. One 
is called “Mr. Chairman” and explains the rules of order con¬ 
cisely, yet adequately. The other is called “Officers’ Blueprint” 
and has many good suggestions and recommendations. 

The second general function of student government stated 
previously is the responsibility for what we called “all-college” 
affairs and programs. This would include, first of all, “all col¬ 
lege” social functions and affairs, such as dances or similar 
functions, which are common to all colleges. Other types of 
programs would perhaps vary from campus to campus. As 
illustrations of those for which the student government can 
assume full or partial responsibility I would suggest the follow¬ 
ing: campus activities for the annual homecoming, field days, 
the relations of inter-college student associations with the stu¬ 
dent body, parents’ weekend, the orientation of new students, 
and convocation programs. We can include here, also, the spon¬ 
soring of student forums and inter-college conferences, on stu¬ 
dent problems or on pertinent topics of the day. If the college 



572 educational and psychological measurement 


has a student union with a union board to direct the activities 
centered there, this also, I believe, should be placed under the 
general responsibility of the student government. 

The third general function of student government concerns 
the formation of policies. Conditions of student life and the 
operation of student activities are certainly of great concern to 
the student body as well as to the faculty and administration, 
and policies concerning them are much more effective if the 
students have a voice in their formation. The formation of such 
policies should be a cooperative or joint responsibility of stu¬ 
dents and faculty. (The term “faculty" will sometimes be used 
loosely here to include both faculty and administration.) Hence, 
a joint student-faculty committee, meeting weekly, is a prac¬ 
tical necessity for this purpose. Such a committee has been set 
up on a number of campuses. The committee should be ap¬ 
pointed by the president of the college, with the student mem¬ 
bers designated by the student government. Its purpose is to 
draw up policies governing student life and student activities 
at the college. It should have the same recognized status as do 
all of the other committees appointed by the president. The 
student government should have the responsibility not only of 
designating the student members of this committee, but its 
approval, as well as that of the faculty, would be required be¬ 
fore any of the policies proposed are accepted. It can also assist 
the committee by the recommendation of points for incorpora¬ 
tion in the policies to be proposed. 

Considering, now, the last of the stated functions of student 
government, it is obvious that a college educational program 
can operate effectively only in an atmosphere of mutual respect 
and understanding and cooperation between the three major 
groups which compose the college community, students, faculty, 
and administration. There must be an opportunity for free dis¬ 
cussion and interchange of ideas between all three. The student 
government furnishes an effective instrument for achieving this 
desired result, if channels are provided for direct approach to 
each of these major groups. 

For contact with the student body, a necessary means would 
be a student convocation, monthly or oftener, conducted en¬ 
tirely by the student government. Contact should also be main- 



STUDENT GOVERNMENT 


573 


tained through the medium of student publications. Other 
means would be the holding of meetings open to the student 
body, and having an office and definite office hours when stu¬ 
dents can come to present and discuss problems, questions, and 
suggestions. 

For contact with the faculty and administration, there is, 
first of all, the faculty or administrative adviser and the stu¬ 
dent-faculty policy committee mentioned earlier. Other means 
will vary according to local conditions. On our own campus 
several changes have been made as our program progressed. 
Originally we had regular business meetings of a college coun¬ 
cil, composed of the student government and a committee from 
the faculty and administration, of those directly concerned wi th 
student problems. While this body had other functions, our 
main idea was to educate the student and faculty groups to an 
understanding of and respect for the viewpoints of the other. 
However, so much has been done to modify viewpoints and 
attitudes of both students and faculty as to make the lengthy 
discussions we used to have, unnecessary. Questions or prob¬ 
lems which arise now, are taken up directly with the proper 
administrative officer concerned or with the student-faculty 
policy committee, and settled promptly and satisfactorily. Since 
business meetings of this college council are no longer neces¬ 
sary, we are changing this year to informal luncheon meetings 
in order that the two groups will still keep in close touch with 
each other and so safeguard the harmony in attitudes and 
thinking. 

Originally, too, we held separate meetings also of the faculty 
committee which was part of the college council. This would 
be poor practice if the intention had been to form a united 
front to present before the student officers. Our intention, 
really, was to modify the somewhat extreme viewpoints of one 
or two of our members, and this enabled the joint meeting with 
the student group to proceed more smoothly as a result. We 
have since discontinued this separate meeting of the faculty 
committee. 

Another means which makes for good relations is for the 
faculty and administration to consult with the student govern¬ 
ment even on those problems and policies over which they 



574 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

retain final decision. If changes in policy are explained before¬ 
hand, together with the reasons which influenced the decision 
much better understanding and cooperation on the part of the 
students can be achieved. 

A program of responsible student government often requires 
much patience in the beginning. At first there is quite likely to 
be a distrust and sparring for advantage which delays progress. 
There is also, at first, a tendency on the part of students to be 
preoccupied with very petty problems and to ignore the really 
important ones. This is not their fault since they have not had 
any previous education in this respect. However, with tact and 
patience this difficulty can be overcome, and the final result is 
worth the effort. 

It may seem that I have completely ignored the area of 
student conduct and student discipline in relation to student 
government. This is not my intention since I regard it as being 
included under each of the four major functions discussed, 
Through the student-faculty policy committee, for example, 
the student government has a very definite voice in formulating 
policies regulating student life and student conduct. And the 
responsibility for activities of student organizations or for the 
whole student body entails a responsibility for student conduct 
in connection with such activities. Further, it is my conviction 
that the student government can also assume the responsibility 
for supervision of student conduct in residence halls. I have 
not placed this as one of the major functions of student govern¬ 
ment because it seems to me that too great a stress on the area 
of student discipline implies a negative rather than a positive 
approach, and can lead to a neglect of other important areas. 
It also surprised me at first that for the most part, the men 
students, at least, do not care to assume responsibility for the 
supervision of student conduct in residence halls unless there 
is considerable dissatisfaction with the way this supervision is 
being handled. At a conference on student government which 
our students sponsored earlier this month, and which was at¬ 
tended by delegates from about twenty-five colleges, this ques¬ 
tion was discussed at some length. According to the report I 
received, only one of the men delegates was strongly in favor of 
having the student government assume responsibility for the 
supervision of student conduct in residence halls. All were very 



STUDENT GOVERNMENT 575 

anxious to have a voice in determining the policies, but, in 
general, did not want to go beyond this. 

As a final point, I would like to comment on some conditions 
■which are necessary for the effective functioning of any pro¬ 
gram of student government. I will pass over the necessity of 
having the authority and responsibility of that body clearly 
defined, as being too obvious to need comment. Outside of that, 
the most essential condition is that there be good relations 
between the student body and the faculty and administration. 
If the faculty lacks confidence in the student group and its 
representatives, if it is unwilling to take time to discuss fully 
with them questions and problems of mutual concern, thus 
ignoring the educational possibilities this affords, then the pro¬ 
gram is foredoomed to failure. The result will be that in the 
students’ minds the college community will be composed of 
two rival factions, students versus faculty, each struggling 
against the other. The administration must take all possible 
means to prevent such a situation. Some means which can be 
used have been indicated earlier, but even these will fail unless 
the viewpoint of the faculty and administration is one of re¬ 
spect for and confidence in the student group. 

A second necessary condition for an effective student govern¬ 
ment program is provision for insuring continuity of policy and 
the education of student officers. If each newly elected group 
must start from scratch, there will be no appreciable growth. 
One practice which is good and is also quite common, is to 
have the new members elected early enough so that they can 
sit in at all of the meetings of the present members until the 
time comes for them to take office. Another means which our 
own student government uses seems to me to be even more 
fruitful. In late Spring they take the officers-elect away to a 
camp for a two-day orientation. Several members of the ad¬ 
ministration and faculty are invited also. The time is spent in 
discussing with the newly elected officers, the problems which 
came before the student government during the past year, the 
solutions arrived at, the policies established, and the projects 
and plans for the future. The specific purpose is the education 
of the new officers in their duties and responsibilities. It really 
works. 

A comment might be made also on the size of the student 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

government body, This also is a contributing factor to effective¬ 
ness. If the group is too small it becomes ineffective, oris in 
danger of becoming the tool of pressure groups. If it is too 
large it becomes unwieldly, and tends to lose the “esprit de 
corps” which should characterize it. 

This brief discussion of student government does not pretend 
to exhaust the subject. I have attempted simply to explain 
those points which have impressed me most strongly when 
working with students. Many others could undoubtedly be 
added. The one general impression I have from my own experi¬ 
ence with students and their officers is their willingness and 
their ability to show a strong sense of responsibility, to be 
mature in their judgments, and to understand and discuss in¬ 
telligently the problems involved in dealing with people. Capi¬ 
talizing on these student traits goes a long way toward achiev¬ 
ing our educational objectives and toward developing the 
educated leaders we want our college graduates to be. 



STUDENT PERSONNEL WORK AND THE 
NATIONAL STUDENT ASSOCIATION 

GORDON KLOPF 

Chairman, National Advisory Council, N. S, A., University of Wisconsin 

I bring you greetings from the National Staff and the Na¬ 
tional Advisory Council of the United States National Stu¬ 
dent Association. It is gratifying to both the Council and the 
Staff that the American College Personnel Association has al¬ 
ways had a place for discussion of the National Student Asso¬ 
ciation on its convention program. This is perhaps rightly so— 
for who should be more concerned about a program affecting 
students in over three hundred colleges than the personnel 
workers in those colleges? Before we explore the role of the 
college personnel program and staff persons in relationship to 
the National Student Association, let us observe what NSA is 
doing and what its future plans are. 

In studying the objectives and programs of NSA, we find a 
great emphasis given to the importance of training students 
for citizenship. To serve this end, NSA is urging the develop¬ 
ment of the campus as a community—a community of students, 
faculty, administrative, clerical and service staff, as well as 
regents, trustees and alumni. To make a community philosophy 
function, students must be represented on major committees 
and boards—particularly those which affect student life. If the 
campus or educational community is to be an educational ex¬ 
perience and a training for citizenship, it must be more real 
than something we put on like “Sunday go-to-meeting clothes.’' 
In all phases of campus life, an opportunity must be provided 
for democratic processes to function. 

Few institutions have given students an opportunity to play 
a part in the academic planning program. If we are to give the 
student a realistic experience in democratic community plan¬ 
ning, it is essential that we break down some of the distinction 
between the student and the teacher. As Harold Taylor, Presi- 


57 7 



578 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

dent of Sarah Lawrence College, says, “Education is not some¬ 
thing done to students, it is something students and teachers 
do together.” Educational planning has been chiefly the impos¬ 
ing of the academician’s point of view upon the student; NSA 
urges greater consideration of the student’s point of view. Presi¬ 
dent Blanding of Vassar says, “Student opinion concerning 
matters which are considered to be the chief responsibility of 
the administration and faculty is extremely important, particu¬ 
larly if presented in a thoughtful, constructive and responsible 
manner,” In California we find President White of Mills Col¬ 
lege “deploring the lack of faculty respect for student opinion.” 
Pie thinks NSA should accept the challenge to do something 
to generate a greater respect on the part of faculty and adminis¬ 
trative personnel for student opinion. 

NSA has developed, as many of you know, a program of 
student-faculty evaluation, The first edition of the program 
describing student-faculty evaluation sold out shortly after it 
was issued. Copies of the second edition are still being ordered 
in large quantities. This publication contains basic principles, 
forms, and procedures which can easily be adapted to the local 
institution. Students are deeply concerned about improving 
instruction—and who knows more about the instruction they 
are receiving than they do? 

Among the pioneer programs in student-faculty evaluation 
were those at the University of Michigan, University of Cali¬ 
fornia and the University of Wisconsin. Recently, one.of the 
departments at Wisconsin had an assembly with both faculty 
and students in attendance to evaluate the role each played in 
the instructional work of the department. These and similar 
programs have been motivated by the National Student Asso¬ 
ciations’s work in this area. 

An issue which concerns many college administrators is that 
of academic freedom. At the 1949 Congress, the Association 
resolved, “That membership in any political, religious, or other 
organization, or adherence to any philosophical, political, or 
religious belief does not constitute in itself sufficient grounds 
for the dismissal of faculty, failure to rehire, or denial of tenure 
to educators of the United States.” 

In exploring the role of the student in the government of his 



STUDENT PERSONNEL WORK 


579 


community, the NSA has given impetus to a tremendous 
interest in student government. A number of excellent publica¬ 
tions in the form of booklets and mimeographed program mate¬ 
rials have been published by the Association. NSA has stimu¬ 
lated the development of student leadership conferences, stu¬ 
dent government clinics, and workshops on the local, regional 
and national level dealing with the role of the student in the 
governing of higher education. 

In promoting the concept of college or university as an edu¬ 
cational community, the NSA realizes that most aspects of a 
community must be governed by trustees, regents, deans, fac¬ 
ulty and administrators. It is urging, however, that student 
opinion and representation be included to a greater degree on 
committees, councils and boards, giving students the oppor¬ 
tunity of expressing their point of view and of having the ex¬ 
perience of working with the staff members of the educational 
community. If we accept the responsibility of higher education 
as being a training ground for citizenship, we need to think of 
the institution as a community-structured unit with students as 
well as staff members as citizens, participating in the planning 
and governing of the community. It is to this end that NSA 
is working. 

The National Student Association is also interested in devel¬ 
oping "concerned citizens” among our students. To implement 
this objective it has planned an extensive international pro¬ 
gram. Almost eight hundred students will participate in the 
tours abroad this year. In providing this travel program, NSA 
not only saves the American student hundreds of dollars on 
every tour, but is helping the student to get the maximum from 
his travel experience by making the tour a “study” as well as 
"sightseeing” program. The student not only sees the Eiffel 
Tower but learns about the people of France through studying 
the French language, the history of the French people, and 
their customs, while on board the ship taking him to Europe. 
In France he meets with student groups and with people of 
France other than the "Cook’s Tour Guide” type of individual. 
When the student returns to his campus, NSA has urged him 
to try to give other students some means of benefiting from 
his experiences abroad. NSA has also developed a program of 



580 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

work camps and has done a great deal to bring displaced per¬ 
sons to the American campus. It is constantly working with 
other agencies in the international field. Shortly, you will see 
on your desk a copy of Youth and UNESCO, a new publica¬ 
tion which has been published by NSA and UNESCO. NSA 
has been a vital part of the program of the World Student 
Service Fund. Many of you may have heard about the expanded 
role of World Student Service Fund in the program of inter¬ 
national education. NSA has been consulted on this program; 
and, as it takes shape, NSA leaders will be involved in its 
implementation. 

The National Student Association has also urged colleges to 
permit political activities on campuses, including the permis¬ 
sion for speakers of all political views to appear, and the devel¬ 
opment of political organizations on the campus. I think it 
might be said that the NSA agrees with Robert Hutchins when 
he says, 

The policy of repression of ideas cannot work and never has 
worked, the alternate to it is the long difficult road of educa¬ 
tion; to this the American people have been committed. It 
requires patience and tolerance, faith in principles and prac¬ 
tices of democracy, faith that when the citizen understands 
all forms of government that he will prefer democracy and 
that he will be a better citizen if he is convinced than he would 
be if he were coerced, 

The program of the Association is interested in developing 
a “socially concerned” student, All phases of the National Stu¬ 
dent Association's Program are aimed at providing experiences 
in inter-group and inter-personal understanding. The 1949 Con¬ 
gress certainly served to illustrate the importance of students 
who represented different backgrounds and points of view work¬ 
ing together when a sub-commission dealing with a debatable 
statement of policy refused to present a final draft until the 
students holding an opposing point of view were consulted and 
placed on the committee. Through the regional, state, and 
national conferences, students of all races, religions, political 
backgrounds, geographical regions, social and economic status 
have an opportunity to work together. The national congresses 
have taken definite stands on discrimination in student groups 



STUDENT PERSONNEL WORK 


581 


and have asked member student governments to prohibit or¬ 
ganizations which discriminate against groups of individuals. 
To help implement the best in human relations in the Educa¬ 
tional Community, the Association has recently published a 
booklet, Human Relations in the Educational Community. This, 
as well as many other program materials issued by the Associa¬ 
tion, will give students, faculty members, and administrators 
suggestions for meeting the challenge so ably stated by Presi¬ 
dent Charles S. Johnson of Fisk University that 

Unless the American people solve the racial issue they 
face a national defeat from within through loss of faith in their 
very reason for living. We cannot rest now, or turn back the 
tides, or settle the crucial issues by comfortable compromises. 

We can either be courageously righteous in our belief in our¬ 
selves, or adopt an ideology and way of life to fit our insepar¬ 
able sins. 

The National Student Association is also concerned with the 
economic welfare of students. As many of you know, it has 
developed a Purchase Card Plan which has been successful in 
many educational communities, The staff realizes that the plan 
is not workable in every community and has developed other 
means of helping students to meet their economic needs. The 
NSA is distributing program materials concerning cooperative 
stores, housing and eating groups. At the 1949 Congress, it 
approved by an overwhelming majority the need for federal 
scholarships to be awarded on the basis of need and ability and, 
recently, the national staff participated in a conference spon¬ 
sored by the American Council on Education in the drawing 
up of a bill for federal scholoarships to be presented to Congress. 

In concluding this section, which has given you a brief pic¬ 
ture of the program of the Association, I wish to say that I 
agree with President Harold Taylor of Sarah Lawrence College 
that “a lethargy is present in the American student body which 
has resulted from the fact that our college and university ad¬ 
ministrators and faculty have not given sufficient encourage¬ 
ment and opportunity to the participation of the student in 
the total life of the campus.” The National Student Associa¬ 
tion is three years old, and I am sure you will agree with me 
that it has done much to encourage student participation in 



582 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

the total life of the campus. Part of the success of the Associa¬ 
tion, however, is in your hands. 

It is important this morning that we also examine the struc¬ 
ture and administrative procedures of the Association. The 
most frequent question asked of the Advisory Council, now 
that administrators are assured the Association is not overrun 
with “fellow travelers,” concerns the matter of cost. Many of 
us will admit the cost has been high and have urged the Asso¬ 
ciation to study the possibilities of reducing membership fees. 
Since the membership has increased, dues have been reduced, 
and are going to be reduced to an even greater degree. However, 
it is important that we compare the cost of membership in the 
National Student Association and the cost of student govern¬ 
ment to other activities on campus. There is hardly a college 
that does not spend more on debate and forensics, with rela¬ 
tively few students participating, than they do on student gov¬ 
ernment or membership in the National Student Association. 
The experience of a student who attends a Conference or the 
Annual Congress is just as important to that student as his 
participating in a debate tournament or a regional forensic 
contest. Are American institutions as willing to spend money 
to educate for citizenship as they are willing to spend money 
to buy band uniforms, train baton twirlers, debaters and 
athletes? 

I believe that, basically, the problem with the National Stu¬ 
dent Association is not the three cents it costs each student on 
the campus to belong, but rather lies within its structure. The 
organization nationally consists of the Student Congress which 
meets annually, the National Executive Committee, composed 
of Regional Representatives, which meets between Congresses, 
the Staff Committee, and Regional Organizations. The weak¬ 
ness in its structure lies in the Regional organization and in 
the local campus channeling. On your local campuses, the per¬ 
son who should be most concerned with the program of the 
Association is your student government president. Because of 
the complexity of his job, he may have assigned the channeling 
and coordination of the NSA materials to a special committee, 
commission or coordinator. However, my experience with local 
campus structures indicates that the closer the president is to 



STUDENT PERSONNEL WORK 


583 


the NS A program, the more he reads and channels program 
materials to proper committees, the better the purposes of the 
National Association are being served. Let us take, for example, 
the recent material that Ted Perry, the Vice President in charge 
of Student Life, has sent to the Student Government President 
concerning campus social and recreational programming. Ted 
has developed an excellent collection of materials concerning 
both formal and informal campus recreational programs. When 
the student government president receives this, he should im¬ 
mediately forward it to the Campus Social Committee, the 
Union Dance Committee, the Dormitory Social Committee, or 
whatever Committee or Board is concerned with planning cam¬ 
pus social activities. He might also refer it to the Dean of 
Women or Men, the Student Activities Director or the Dormi¬ 
tory Social Director. I cannot urge you as personnel people too 
strongly to be sure the material that the National Student 
Association distributes is read by the people who should be 
concerned with the particular project. The campus that per¬ 
mits these excellent suggestions to lie on the student govern¬ 
ment president’s desk is certainly not getting its money’s worth 
from membership in NSA. Again, I say it is not the cost factor 
of the Association itself—it is the inefficiency of our own stu¬ 
dent leaders in channeling the NSA program material. 

Another factor is that of leadership in the Association. It has 
frequently been said that “students will be students” and can¬ 
not accept the responsibility of administering a national organ¬ 
ization of the scope of the National Student Association. I 
think it is important that we become convinced, along with 
Dean De Vane of Yale University, that it will not do to under¬ 
estimate the abilities of our young people, and that, if the or¬ 
ganization has not enough in itself to assure its continuation, 
it ought to die. I think we also agree with the college president 
who said that, “We do not want to see an aging secretariat 
grow up in NSA.” However, I think we have to realize that 
the mature leadership of the post-war veteran student body is 
no longer present. We all realize that the American student 
body is not as mature as the student bodies of two years past. 
Our job as personnel workers is to be sure that we encourage our 
local NSA programs and write to the national officers to give 



^84 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

advice and suggestions. Administratively, the Association is 
meeting the problem of continuity of leadership through hav¬ 
ing several of its officers run from February to February and 
others from September to September. If ever there was a need 
for a National Student Association, to develop concerned citi¬ 
zens, it is at the present time. 

Last of all, I would like to refer specifically to the role of the 
National Student Association and the personnel worker. The 
Association is established to achieve many of the same objec¬ 
tives in which we, as counselors of students, are interested, If 
all its printed materials, its hundreds of answers to individual 
letters on local problems, and its regional and national confer¬ 
ences and congresses are fully utilized by your campus, the 
Association can help you achieve the objectives of your per¬ 
sonnel program, The local, regional and national leadership, 
however, needs your help. It's up to you to carry out the lines 
of the song, f( Accentuate the Positive and Eliminate the Nega¬ 
tive.” I have attempted, today, to mention just a few of the 
positive contributions and significant objectives of the Nationa 
Student Association, Again, I say it is our job, as personne 
workers, to be informed about them and to help the student t( 
accentuate them. 



CONTRIBUTIONS OF THE STUDENT UNION TO THE 
TOTAL PERSONNEL PROGRAM 

(An Abstract) 


DONOVAN D. LANCASTER 

Director, Moulton Union, Eowdoin College, Brunswick, Maine 

Among the many personnel services that characterize the 
contemporary American college, the student union is a com¬ 
parative newcomer. Student union goals may be expressed 
simply: 

i. To help provide a recreational program for the student 
body. 

i. To reduce the cost of going to college by supplying inex¬ 
pensive recreation. 

3. To further fellowship and understanding by providing an 
opportunity for students of different races and social back¬ 
grounds to meet on an equal footing. 

4. To promote the personal development of the students by 
bringing to the union the best in the arts and by giving 
the student an opportunity to participate in gracious so¬ 
cial gatherings. 

5. To provide a situation where students participate in self 
government and learn to cooperate with others and to 
take responsibility, 

6. To unify the campus, large or small. 

The bronze plaque at the front entrance of our own Bowdoin 
Union says, “Here the fires of friendship are to be kindled and 
kept burning.” 

What about the administration of these organizations called 
student unions? The most successful unions in this country are 
located on coeducational campuses and are housed in coeduca¬ 
tional buildings, We have seen the utter folly, even within the 
last decade, of building separate plants for men and women at 
opposite ends of the same campus. It sounds amusing to us 
today, but it is also tragic. 


58 J 



586 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Now the driving force within any student union is the di¬ 
rector. While about 85 per cent of our union directors are men, 
some of the directors of large successful coeducational unions 
are women, I know a number of coeducational unions with ex¬ 
cellent women directors. I shall not go into the merits of this 
situation, but I shall take advantage of my position here and 
refer to the director as he. The union director largely determines 
the union goals because he is on the job day after day and be¬ 
cause he is the manager of the building, He should be in charge 
of all personnel, directly or indirectly, within the union. Other¬ 
wise, many times his hands are tied. It is difficult for some 
college presidents and business managers to see this point. 

The union director should be advised and assisted in policy 
making by a faculty-student board. Faculty and staff members 
indispensable on such boards include deans of students, student 
counselors, directors of student activities, teachers of psychol¬ 
ogy and allied personnel workers. Here is the great chance for 
personnel officers to make their influence felt. On the other 
hand, in a smaller institution like my own, the union director 
also serves on various boards for the deans’ offices. This is a 
desirable interlocking arrangement. 

Student members are also indispensable on policy-making 
union boards. Here, as almost nowhere else, the undergraduate 
tries his wings in student government and organization. He ha9 
a building, a workshop, a program to direct. Here is democ¬ 
racy really at work. 

The program of the National Association of College Unions 
for nearly ten years has contained papers describing the job of 
the union director and his responsibilities for coordinating his 
program with that of other personnel offices. If he is not doing 
so, it may be because he has been charged by the university 
with the task of making a multi-million dollar building pay for 
itself and that he has little time for anything else. I am sorry 
to say that I think there will be more rather than less tendency 
in the future for university officials to put pressure on the fi¬ 
nancial rather than the personnel considerations in the direc¬ 
tion of college unions. 

During the next few years enrollments will decrease and stu¬ 
dent personnel staffs, including student union staffs, will likely 



CONTRIBUTIONS OF STUDENT UNION 


5*7 

be reduced, The directors of the many new union buildings 
that have been built recently, or those in the process of com¬ 
pletion, are likely to facevexing financial problems. Every effort 
must be made, therefore, to avoid overlapping in services and 
to coordinate the union programs with the larger university or 
college personnel programs. Listed below are some important 
steps that might be taken; 

1. Centralized recording of the social and recreational inter¬ 
ests of students’ might overcome the expense of duplicate 
records. 

1. The student union organization might contribute more 
effectively to the freshman orientation program of the 
institution. 

3. The creative arts program of the union might become an 
important laboratory for academic instruction in these 
areas as well as a setting where students may acquire 
recreational skills or vocational tryout experiences. 

4. The student union organization must go far beyond the 
building itself. Returning veterans who have experienced 
the possibilities of successful student union organizations 
on various campuses have often established on their own 
campuses effective programs without buildings or with 
very inadequate facilities. 

For those of you who wish to pursue the whole subject of 
the student union more fully, I suggest that you consult Col¬ 
lege Union—A Handbook on Campus Community Centers, by 
Edith 0 . Humphreys, This is the most exhaustive study of 
student unions made in America. If you are planning a new 
union building the National Association of College Unions 
stands ready to help you. Inquiries should be addressed to 
Edgar A. Whiting, National Secretary at Cornell University. 



MAJOR ISSUES AND TRENDS IN THE GRADUATE 
TRAINING OF COLLEGE PERSONNEL WORKERS 


W. W. BLAESSER 

Specialist for Student Personnel Programs, U, S. Office of Education 

and 

CLIFFORD P, FROEHLICH 

Specialist for Training Guidance Personnel, U. S. Office of Education 

What are the major issues and trends in the graduate train¬ 
ing of college personnel workers? We will not attempt in this 
brief paper to present a definitive answer to this question, Our 
purpose is to try to stimulate discussion through a rather ar¬ 
bitrary selection of issues and trends. 

A first step in considering this problem of training is to an¬ 
swer the question, ft Who are personnel workers?” We must 
have clearly in mind who we are training before we can talk 
about what kind of training they should have, During the past 
fifteen years we have had quite a flowering of books, articles, 
speeches and committee reports identifying student personnel 
functions, services and workers. Despite considerable variation 
and, at times, conflicts in our literature and in our practice, 
we now seem to have a fairly common understanding of the 
general scope and functions of personnel workers. We generally 
agree that instruction, business management, public relations 
and maintenance are not personnel work. But when we get to 
specifics, we find the first issue to raise. The following Com¬ 
mittee publications will illustrate the point at hand. 

In 1937 the American Council on Education brochure, en¬ 
titled The Student Personnel Point of View, included health as 
one of 23 student personnel functions. In the 1949 revision of 
this brochure, health functions were again included among the 
17 basic elements of a student personnel program. 

Also, in the 1948 report of the ACPA Committee on Pro¬ 
fessional Standards and Training, it was recognized that health 
services were one of the student personnel functions, Yet the 

588 



TRENDS IN GRADUATE TRAINING 589 

Committee sidestepped the issue of the training involved by- 
commenting as follows: 

Two types of personnel services are included in the list of 
functions with which we started, but not in the special train¬ 
ing recommendations. The first of these consists of positions 
for which recognized standards are already set by some ac¬ 
crediting agency. Physicians and nurses in the health service 
. . . would fall into this category... it would seem to be ad¬ 
visable to let the persons in the administrative position under 
whom these activities fall set up standards for them which 
are in accordance with the goals of the program of the par¬ 
ticular institution. 

The issue, then, is whether or not the occupational group of 
personnel workers include all those who perform personnel 
functions. Are nurses serving college students personnel work¬ 
ers? If they are, should they be trained as personnel workers? 
Or is their primary allegiance to the occupation of nursing? 
We have, on the college scene, many persons whose primary 
techniques are derived from other professions, such as nurses, 
social workers, speech correctionists, physicians and clinical 
psychologists. In any list of personnel functions, the services 
rendered by these individuals are usually included. Is person¬ 
nel work really as inclusive as the 23 functions listed by the 
ACE would lead us to believe? Or, is personnel work a small 
nucleus rendering a unique service which is concerned primarily 
with the individualization of education by means other than 
instruction, maintenance and administration? 

This issue leads us clearly to the next, concerning the pro¬ 
fessional status of personnel workers. Is our occupational group 
really a profession? Darley and Wrenn carefully considered 
this problem and proposed eight criteria, against which the 
occupational group could measure its degree of occupational 
professionalization. They concluded that, as a whole, student 
personnel work falls short of professional status, by all of their 
criteria save one; namely, we do have a body of specialized 
knowledge and skills. Of course, the question of whether or 
not we are a profession is largely academic. For the purpose 
of this discussion, it is important to recognize that as an oc¬ 
cupational group, although somewhat ill-defined, we do seem 
to have a set of unique skills and a body of knowledge. This, 



590 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

we believe, is one of the most important reasons that we, as 
personnel workers, have for considering today the training of 
personnel workers. If we did not have something unique, then 
we could leave our training problems up to other disciplines. 
When we needed workers we would then recruit from other 
disciplines. 

A thoughtful discussion of the uniqueness of our training is 
found in the ACPA report just referred to. This report stressed 
that 


Since all personnel workers have as their central aims the 
welfare of the individual student, and his adjustment to the 
college situation, both in and out of the classrooms, it has 
seemed to us that training for all should be built around a 
common core. This should involve information with regard to 
individuals as individuals, and as members of groups. It should 
also include the development of skill in identifying individual 
needs and problems, and handling interviews and group leader¬ 
ship situations constructively. 

The common core was then outlined in terms of course work, 
along with a general recommendation about the need to include 
supervised experiences. In addition, the report spelled out five 
rather specific groupings of personnel occupations, and indi¬ 
cated the desirable training recommendations for each group. 

So much for this forward-looking report, We need now, for 
the purposes of this paper, to make a rough and arbitrary clas¬ 
sification of the majority of training programs available today. 

We shall admit readily that a particular program may not 
fit perfectly into one of these categories. But the classification 
does serve to highlight certain issues. First, there is the “if- 
some-is-good, more-ought-to-be-better type of program.” This 
training has its primary orientation in counseling, an applied 
branch of the science of psychology. In this program, levels of 
personnel workers are recognized. Those with bachelor’s de¬ 
grees in psychology are considered capable of working as place¬ 
ment interviewers in the College Employment Office. They can 
be resident dormitory counselors while they work on their 
M.A.’s, or they may be preliminary interviewers in a Coun¬ 
seling Bureau. At the next level, the M.A.’s can work in the 
Dean of Students’ office, handling simple discipline, loan funds, 
or they can be counselors in the Veterans Counseling Bureau, 
By taking more training in psychology, these persons may earn 



TRENDS IN GRADUATE TRAINING £gl 

a Ph.D. They are then eligible to move up to the top level. 
Here they can become the Dean of Students, or if they are so 
inclined, can get into the teaching end and train more person¬ 
nel workers. This program is characterized by adding more and 
more training in counseling, upon the apparent assumption 
that the higher the counseling skill the better the personnel 
work. 

Now there is a second t} r pe of training program, namely, 
“to-each-his-own-specialty” type. These programs recognize 
areas oj specialization. By enrolling in this type of training pro¬ 
gram, students can be trained as vocational counselors. Their 
training may duplicate, in part, that of a personal counselor, but 
the training program will tend to accentuate differences, spe¬ 
cial skills, rather than skills common to all personnel workers. 
Earlier in this discussion we pointed out a pertinent example 
of this “area of specialization” approach in which certain spe¬ 
cial groups for which standards were set by some other pro¬ 
fession, were accepted as personnel workers. It is our belief 
that this type of program is based upon the assumption that 
there is not a body of specialized knowledge and skills in per¬ 
sonnel work. Rather, the personnel functions are a conglomer¬ 
ate of occupations, under a single banner, and not a single 
occupation with a variety of specialties. If we carry this belief 
to its logical conclusion, each of us as personnel workers would 
have our primary home in some other professional land. In 
fact, we would think of ourselves as psychologists, or dieti¬ 
tians, or vocational counselors, who just happen to be working 
in a college. 

In some quarters there is strenuous opposition to this second 
point of view. And this opposition has been the motive power 
behind the establishment of a third type of training program, 
namely, “be-a-generalist, be-an-educator” type. The training 
program is quite logically designed to provide a broad basis 
in education. 

The students get this in courses in the principles, history 
and philosophy of higher education; and in methods of educa¬ 
tional supervision and administration. Primarily, this point of 
view stresses the setting in which college personnel workers find 
themselves. 

This point of view, while it recognizes the importance of the 



592 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


setting, fails to recognize the unique body of knowledge and 
skills which personnel workers should possess. 

These somewhat facetious and critical descriptions of train¬ 
ing programs highlight three of the important elements which 
we believe should be characteristic of every training program. 
College personnel training programs should be designed to pro¬ 
vide for levels of specialization, areas of specialization, and the 
setting in which the students will work. We recognize that real 
problems will arise in organizing a program which takes cog¬ 
nizance of all three elements. These problems can be pointed 
up by considering two types of workers which are now ordi¬ 
narily found in college personnel programs, namely, the coun¬ 
selor and the college nurse. At the present, the college nurse 
is trained under medical auspices. Her status is accepted by 
the medical profession, and to it she owes her primary alle¬ 
giance. She is truly trained to serve in her area of specializa¬ 
tion. However, she receives little, if any, training in the spe¬ 
cialized knowledge and skills of personnel work. Likewise, her 
training for service in the educational setting is neglected. Un¬ 
der the program proposed, this nurse would receive her training 
to the full competency which she now has in her speciality, but, 
in addition, she should receive the common core of training 
which was specified in the 1948 ACPA Committee Report, pre¬ 
viously referred to. She should be trained so that she under¬ 
stands the goals and objectives of educational institutions and 
of the personnel program in them, and of her role in that pro¬ 
gram. With such training, this nurse would not see students as 
a parade of physical disorders, of stomach cramps, and head¬ 
aches, but rather she would keep in mind other possible aspects 
of the student’s adjustment. The student whose stomach is 
upset because of fear of failure would not only get bicarb, but 
he would be referred to the Counseling Bureau. 

What about the counselor? If he were trained in a program 
which fully recognized the necessity of these three elements, 
he, too, would be a more efficient personnel worker. Instead of 
striving for status as a sort of junior psychiatrist, he would 
clearly recognize the uniqueness of counseling as a service in 
the educational setting. He would recognize its contribution 
and its place in the development of the total educational pro- 



TRENDS IN GRADUATE TRAINING 


593 


gram. He would recognize the levels of counseling skill that are 
possessed by his personnel work colleagues and by his fellow 
educators. And, by the very recognition of levels of counseling 
skills, he would build respect for himself among other staff 
members on the campus. If, for example, he really believes 
that college faculty members have a role in the counseling 
process, then he would make intelligent use of such counseling 
skills as they might possess. If he really believed that all of 
the staff members of the institution had a common goal and 
purpose, that of enabling the individual student to achieve 
maximum learning from his total college experience, then he 
could join with them under the banner of personnel work. 

These two illustrations have been cited to point up some of 
the difficulties that are involved in organizing a training pro¬ 
gram for college personnel workers. Recognizing levels of spe¬ 
cialization and areas of specialization, and the nature of the 
educational setting, will do much to produce an adequate train¬ 
ing program. The issues, then, can be stated simply as: Are all 
persons who perform personnel functions to receive at least a 
minimum of training in personnel work? If they are, how shall 
that training be organized so that each may attain competence 
in his specialty? And how can that training be organized to 
provide for workers at various levels of competency? Finally, 
how can the training be planned so that students become fa¬ 
miliar with the setting in which they shall work? 

It is easy to raise issues when you are not charged with the 
responsibility of providing the answers. We find it more diffi¬ 
cult to fulfill the second part of our assignment today, that of 
identifying trends in the training of college personnel workers. 
The spotting of trends is frequently a combination of limited 
observation and pious, hopeful thinking, a mixture which is 
not always known to the mixer. Therefore, the following ob¬ 
servations are offered without full knowledge of the ingredients 
involved but with the hope that they may furnish food for the 
discussion period. 

We believe that one trend is an increasing emphasis upon 
practical supervised experiences, particularly in the training of 
counselors. These experiences appear to be limited to one or 
two types within the collegiate institution. A few personnel 



594 educational and psychological measurement 

workers are urging that trainees be given a wide variety of ex¬ 
periences before being permitted to follow a specialization. 
Williamson, for example, has urged that counselors be given 
interviewing experiences in community agencies, business per¬ 
sonnel offices, mental institutions, reading clinics, vocational 
guidance clinics, psychotherapy clinics, and in elementary and 
secondary schools as well as in the collegiate setting. The in¬ 
ternship experience, he believes, should be integrated with the 
entire period of academic training, and not just tacked on at 
the end of the formal course-work. A few institutions already 
are moving in that direction. 

Another promising trend appears to be an increasing recog¬ 
nition of the need to analyze training content in terms of actual 
job function, thus lessening the disparity between one's train¬ 
ing and what one actually does on the job. The USES study 
of educational personnel jobs which was reported by the CGPA 
Study Commission at this convention should provide a base from 
which to do more intensive job analysis work. The proposed 
pilot study of CGPA which was also reported on Tuesday 
may provide us with techniques and tools by which we can 
validate training programs against job success criteria. 

Recognizing a third trend, some training programs are now 
providing opportunities for individuals to evaluate and im¬ 
prove their own human relations skills while in training. In 
short, they are being provided with personal counseling ex¬ 
perience, and with group therapy. Please note that we are not 
advocating that all future admissions officers, counselors and 
deans be psychoanalyzed while in training. We are simply say- 
in that while they are learning the skills and techniques of 
personnel work they are also learning to handle their own prob¬ 
lems so they do not interfere with the application of those skills 
and techniques. 

A fourth trend appears to be the development of in-service 
training as a function of student personnel administration, and 
the recognition of the advantages of coordinating this with the 
graduate training programs. We have customarily thought of 
in-service training as a program for graduate students doing 
part-time counseling in the dormitories, or for members of the 
teaching faculty who have agreed to work with students be- 



TRENDS IN GRADUATE TRAINING 


595 


yond the boundaries of traditional academic advising, Yet all 
of us could profit from a well-conceived, long range in-service 
training program to help us improve existing skills as well as 
to develop new ones on the job. Here too, the training must 
recognize and provide for levels of specialization, areas of spe¬ 
cialization and the settings in which the jobs are being carried 
out. The knowledge to be gained from coordination between 
the full-time in-service and the graduate phases of training 
should be of increasing assistance in narrowing the gaps be¬ 
tween training content and job requirements. 

Finally, a fifth trend seems to be increased emphasis upon 
the philosophy of personnel work. The publication of the Joint 
Committee on Counselor Preparation , in which ACPA partici¬ 
pated, recommended training in the philosophy which under¬ 
girds personnel work. It is clear that a training program needs 
a sound and carefully defined philosophical base. We believe 
that there are only a few persons who hold to the mechanistic 
bag-of-tricks approach to personnel work. Personnel work can 
never succeed if its practitioners build their strength upon tech¬ 
nical knowledge to the exclusion of basic human values. 



EMPLOYMENT OUTLOOK FOR THE 1950 CROP OP 
COLLEGE GRADUATES 

EWAN CLAGUE 

Commission of Labor Statistics, U. S, Department of Labor 

College personnel -workers this year have a particularly 
challenging task: assisting the largest graduating class in the 
Nation’s history to take their place in the national economy. 

About a half million people will receive bachelor’s and higher 
degrees this year, considerably more than last year’s record 
total of 42,3,000. (The 1948-49 total was nearly one-third higher 
than the 1947-48 graduation figure and nearly double the pre¬ 
war peak reached in 1939-40.) 

These large graduating classes, of course, result from the 
post-war boom in college enrollments, stimulated by the G. I. 
training program, Enrollments reached a peak of 2,456,000 in 
the fall of 1949, one million higher than the pre-war record. 

The number of students enrolled and the number who get 
bachelor’s degrees will probably drop for several years after 
1950, as the veterans move out of college into the labor market. 
However, the number of master’s degrees and doctor’s degrees 
granted should continue to increase for a few more years. And 
the drop in college enrollments will be only temporary. By the 
late 1950’s, enrollments will begin to rise again, as the first 
<f war babies” reach college age. The long-run trend for a larger 
and larger proportion of young people to continue their educa¬ 
tion beyond high school will also tend to push enrollments up. 

The great majority of young people leaving college in the near 
future, like most graduates of previous years, will seek jobs in 
professional, semiprofessional, and administrative fields. In 
I 95°—probably also in 1951 and 1952—many will be unable to 
find jobs immediately in the occupations for which they have 
been trained. There are several reasons for this unhappy pros¬ 
pect. The war-time and post-war shortages in a number of occu¬ 
pations have now been filled. The unprecedented numbers of 

596 



EMPLOYMENT OUTLOOK FOR I950 GRADUATES 


new graduates will intensify competition for jobs. Furthermore, 
there will probably be somewhat fewer job openings for new 
college graduates in 1950 than in the first post-war years or 
even last year. 

The Nation's economy is currently operating in high gear, 
and it is likely that employment will continue at about the 
present high level for the rest of 1950. However, unemployment 
may increase somewhat, since the American labor force (includ¬ 
ing both employed and unemployed) is growing at the rate of 
600,oco to 700,000 workers a year. This situation presents a 
challenge to business and industry: to utilize fully our increasing 
supply of potential workers, produce more goods and services, 
and bring about a rise in our national standard of living. In the 
long run, I am confident this goal will be achieved; but in the 
next year, the atmosphere in which college graduates will be 
seeking jobs is likely to be less favorable than at any time since 
the war. 

Such general observations about conditions in the job market 
obscure widely varying situations. Prospects are excellent in 
some occupations, though, in others, graduates will face stiff 
competition for jobs. 

In teaching, for example, there is at once an acute shortage 
of personnel in the elementary schools and a growing over¬ 
supply at the high-school level. For the current school year, 
only one elementary teacher was trained for every three who 
were needed. On the other hand, 4 times as many students 
completed training for high-school teaching as were required. 
This imbalance in supply exists in nearly every State, creating 
a grave problem both for the schools and for the young people 
concerned. College counselors can help to remedy the situation 
by getting the facts on employment-outlook before prospective 
teachers as early as possible in their college careers. 

Other professional fields in which stiff competition for jobs 
is expected in the next few years include: 

Law: This profession is already overcrowded and likely to 
become more so during the next few years. Twice as many 
lawyers passed the bar examinations in 1949 as in the years 
just before the war; unprecedented numbers are currently en¬ 
rolled in law courses. 

Engineering: In the early 1950’s, the number of graduates 



598 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


will exceed the number of openings in this rapidly growing 
profession. However, after the next few years, the employ¬ 
ment situation for new graduates is likely to improve. 

Chemistry: Competition for positions will be keen during the 
next few years among chemists without graduate training. 
The outlook is better for those with graduate degrees. 

'Journalism: The reporting field, always highly competitive, 
is likely to become more overcrowded in the early 1950’s. Jobs 
will be easier to get with country papers, trade papers, and 
house organs than with “dailies,” 

Personnel work: Competition is very keen in this field. Em¬ 
ployers are insisting on much higher educational and personal 
qualifications for positions at all levels than in the previous 
five or six years. 

There will probably also be an oversupply of business ad¬ 
ministration graduates. A surplus of new graduates has already 
developed in the field of accounting 

Liberal arts graduates with specialized training or work 
experience will find it easier to get jobs than those with only a 
general undergraduate education. 

Fields offering good prospects for new entrants include: 

Nursing: A shortage exists despite the fact that there are 
more nurses than ever before. The demand for nursing service 
will probably continue to rise, 

Medicine and Dentistry: Those able to enter and complete 
training will have good opportunities, However, competition is 
very keen for admission to professional schools. Some new 
schools are opening; more are planned for later in the decade. 

Pharmacy: This is a field in which the supply of new gradu¬ 
ates has almost caught up with the demand. It is expected that 
this profession will be overcrowded in the long-run if enroll¬ 
ments in pharmacy colleges continue at present high levels. 

Other occupational groups important in health service, such 
as veterinarians, medical x-ray technicians , medical laboratory 
technicians, dental hygienists, ■physical therapists, occupational 
therapists, and dietitians are expected to have good opportu¬ 
nities for a number of years. Women with interest in the medical 
field will find many openings in most of these occupations. 

Social work: Current employment opportunities are excel¬ 
lent in all types of positions. The long-run outlook is good for 
workers with graduate training, but those with only under¬ 
graduate training will face increasing competitition. 

Psychologists with graduate training, particularly in clinical 
work, will find good opportunities in the next year or two, 
However, those with only the master’s degree may expect 
increasing competition. Some psychology majors with the 
bachelor’s degree are having difficulty gaining admission to 
graduate training. 



EMPLOYR.EMENT OUTLOOK FOR 19^0 GRADUATES £99 

Many 195 ° graduates who have taken training for occupa¬ 
tions that are, or soon will be, overcrowded will need your 
expert help in adjusting to the situation. 

For some, the best course may be to take a job in a related 
field. Thus, many engineering graduates may be able to put 
their training to use in administrative or technical sales jobs. 

For others, the wisest course will be to continue in school for 
postgraduate work in the same or related fields, in order to 
improve their chances for employment. This is in line with the 
long-term trend toward constantly rising standards of educa¬ 
tional preparation in many occupations. In engineering, for 
example, many people with little, if any, college education used 
to qualify for professional positions on the basis of their practi¬ 
cal experience. Now, it is much harder to do this; most openings 
in the profession are filled by men with bachelor’s degrees, and 
the number of engineers with graduate training, although small, 
is increasing. The same trend toward graduate training can be 
noted in many other professions. In addition, the proportion of 
sales, clerical and administrative occupations for which a college 
education is required or preferred has been growing rapidly. 

Job opportunities in professional and administrative occupa¬ 
tions may be somewhat better for graduates who come out of 
college a few years hence, after the current peak in college 
graduations has been passed. Employment in the professions 
has grown rapidly—from 37 million in 1940 to over 4 million in 
1949. It may well increase to more than 5 million by i960. 
Employment in administrative occupations has likewise shown 
an upward trend in addition, many new graduates will be 
needed yearly to fill vacancies arising because of death, retire¬ 
ment, marriage, or transfer to other occupations; probably more 
will be hired as replacements than new jobs. Nevertheless, if 
college enrollments increase in line with past trends, there will 
continue to be keen competition for positions in most profes¬ 
sional and administrative occupations. This will be even more 
true if enrollments expand as much as has been recommended by 
some educators. 

Since opportunities will be better in some fields than others, 
students will need realistic information on employment pros¬ 
pects in different occupations; they should have this before 



600 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

they enter on a course of training for any field. This informa¬ 
tion needs to be up-to-date, During the past 8 months the 
Bureau has been working on a new edition of the Occupational 

Outlook Handbook. The information we have obtained_from 

industry, organized labor, and professional societies in a great 
number of fields—underscores the fact that the factors affecting 
employment trends are constantly changing. 

College personnel workers can, of course, do much to see that 
young people have the needed information available to guide 
them in making an occupational choice. They can also con¬ 
tribute greatly to a solution of the broader problem of over¬ 
crowding of professional and administrative occupations, by 
helping students to widen their vocational horizons and encour¬ 
aging them to seek employment in a broader range of occupa¬ 
tions, 



OUR STAKE IN THE OCCUPIED COUNTRIES 


An Abstract 

HAROLD E. SNYDER 

Director, Commission in Occupied Areas, American Council on Education 
Washington, D. C. 

Why should American educators be particularly concerned 
with educational developments in Germany and Austria, in 
Japan and the Ryukyus? What stake, as Americans and as 
college personnel officers, have you in the reconstruction of the 
ex-enemy countries and in the rehabilitation of their youth? 

The answer is a simple one. It consists of three main points 
which I believe to be irrefutable. 

Firsts the time has passed if it ever actually existed, when 
the well-being of American youth can be assured by the op¬ 
portunities provided in our home and communities, in our 
schools and colleges. For thousands of our students two ter¬ 
rible wars and the threat of a third even more terrible have 
wiped out and can wipe out again all of the benefits of our 
excellent educational system, all the splendid advantages with 
which we are trying to provide them. Developments in other 
parts of the world are of direct and vital significance to all 
of us, and particularly to our youth. 

Second, while the happiness and security of American youth 
depend upon many factors, it is particularly essential that a 
concerted effort be made to overcome the effects of the per¬ 
verted Fascist philosophy and education on the minds of Ger¬ 
man and Japanese youth. These virile and technically adept 
peoples must not again be permitted by our disinterest to be¬ 
come sources of infection, infestation and eventually of agres¬ 
sion affecting the whole world. 

Poverty and unemployment, frustration and disillusionment, 
indifference and indecision can once more cause German and 
Japanese youth to be attracted by the blandishments of to¬ 
talitarian propaganda, can turn their despair into hatred, can 

601 



6ol EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

make them a threat to the security of American youth. En¬ 
lightened self-interest demands that we be concerned with aid¬ 
ing the process of educational reorganization and reconstruc¬ 
tion and of democratization in the occupied countries. 

Third, World leadership has been thrust upon us. By the 
very fact of our occupation of the ex-enemy countries, we have 
assumed a very special responsibility for what happens there. 
In the eyes of the entire world Germany and Japan are proving 
grounds for the democratic principles which we profess, for the 
efficiency of our methods, for the sincerity of our motives. We 
dare not, therefore, fall into the sometimes tempting illusion 
that these countries can be given identical and equal treatment 
with all other countries with which we maintain cultural rela¬ 
tions. The question is not one of favoring our former enemies. 
It is obvious that they must not be coddled. But it is equally 
obvious that if we are to discharge our special responsibilities 
there, and safeguard our national interests, these countries 
must continue for some time to come to receive special atten¬ 
tion. 



PLANS FOR THE NEW INTERNATIONAL CHRISTIAN 
UNIVERSITY IN JAPAN 


An Abstract 

MAURICE E. TROYER 

Vice-President, in Charge of Curriculum and Instruction, Japan International Christian 

University Foundation 

Almost ioo years ago, in 1852 to be exact, the United 
States officially through her Navy, opened feudal Japan to world 
trade, industry, and technology. In the 100 years that followed, 
Japan learned her lesson well, perhaps too well. Today official 
United States is again in Japan to help with the democrati¬ 
zation of her schools and government, Much has already been 
accomplished in the reorganization of education and govern¬ 
ment, much remains to be done in educating leaders with a 
clear understanding of democratic philosophy and processes. 

The New International Christian University now being es¬ 
tablished in Japan, independent of the Allied Occupation Gov¬ 
ernment, has as its dominant aim the education of leaders who 
will look upon academic knowledge and skills, not as ends in 
themselves, but as tools useful in working toward: (a) the so¬ 
cial order that holds sacred the integrity, worth, and welfare 
of the individual, and (b) group processes of thinking which 
provide the basis for enlightened decision and action but which, 
nevertheless, respect and duly protect the rights of individuals 
and minorities to pursue their objectives through constructive 
educational processes. 

The University will open in April, 1952, with one under¬ 
graduate college and three graduate schools. The major pur¬ 
pose of the undergraduate College of Liberal Arts is to experi¬ 
ment with and demonstrate approaches to general education 
appropriate to the needs and life of Japan. Traditionally, spe¬ 
cialization in Japanese education starts at the high-school level. 
General education has been unknown in colleges and universi¬ 
ties of Japan. It is proposed that the program of general edu- 

603 



604 educational and psychological measurement 

cation in I.C.U. will include not only natural and physical 
sciences, social sciences, and humanities, but also agriculture 
and homemaking, not to prepare specialists in these two latter 
areas but to bring the contribution of those two areas to the 
life of the campus. The graduate program includes a Graduate 
School of Education, a Graduate School of Citizenship and 
Public Administration, and a Graduate School of Social Work, 
to prepare leaders for three areas of public service, education, 
government, and social service, 

One of the vice-presidents of this new university is to ad¬ 
minister and coordinate the student personnel stream of ac¬ 
tivity—recruitment, selection, admissions, registration, orienta¬ 
tion, vocational and educational counseling, clinical services on 
problems of social and emotional adjustment, health services, 
housing, student social activities, placement and follow-up. 

Educational leaders in Japan have declared that there is no 
one among the colleges of Japan qualified to handle this posi¬ 
tion, It will, therefore, be filled by a highly qualified faculty 
member from one of our leading universities in the United 
States, who will also head up the program of graduate training 
in personnel and guidance. This position holds unusual oppor¬ 
tunities for pioneer work in the development of new programs, 
processes, and techniques of guidance in a different, but cer¬ 
tainly not new culture and language setting. 

Plans for the development of the program for the university 
are as follows. The beginning faculty, in April 1952, will con¬ 
sist of about sixty staff members. The major function of the 
new institution is graduate in nature. Since the major function 
of this University is graduate in nature, faculty members will 
have completed their doctorate program and at least three- 
fourths of them will be persons of recognized status in their 
field. About half of the faculty will be Japanese, the other half 
from other countries. 

A number of the non-American members of the Faculty are 
to be selected and brought to the United States on fellowships 
for the academic year 1950-51. About 40 of the faculty mem¬ 
bers, half Japanese and half foreign (non-Japanese), are to be 
selected by June, 1951, at which time they will be brought to 
the United States and assembled on some university campus 



INTERNATIONAL CHRISTIAN UNIVERSITY 


605 

for seven months of planning. This planning session is impor¬ 
tant. A new faculty representing different institutions, coun¬ 
tries, and cultures will need time to think together on the ob¬ 
jectives of this university and to build programs and courses 
which support those purposes. A new system of student records, 
a library, new equipment—these are all problems for study and 
development. 

In the achievement of these purposes of the planning con¬ 
ference, the faculty will have an unusual opportunity to learn 
ways of democracy and Christian Brotherhood in their own 
personal relationships. This is indeed an important pervasive 
objective of this planning period. The Faculty of this new uni¬ 
versity should not unduly confuse their students by discrepan¬ 
cies between what they teach and how they behave in relation 
to each other. 

In January of 1952, these faculty members, together with 
others who will be added in the meantime, will assemble on 
the campus at Mi taka, fourteen miles out of Tokyo, and pre¬ 
pare to open the university in accordance with the Japanese 
academic calendar in April, 1952. 

In the meantime, classrooms, offices, library, and residence 
centers for faculty members and students will be provided 
through a building program costing about three and one-half 
million dollars. Three hundred thousand dollars have been bud¬ 
geted for new books and magazines for the library; more than 
$500,000 for equipment. Financial plans for the university pro¬ 
jected by the Japan International Christian University Foun¬ 
dation in America provide for a reserve of $5,000,000 to be 
used as general endowment and a substantial sum to subsidize 
fellowships and scholarships. 



